Constraint exclusion won't exclude parent table

From: Tim Kane <tim(dot)kane(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Constraint exclusion won't exclude parent table
Date: 2014-05-13 20:02:16
Message-ID: CF9838D8.7EF3D%tim.kane@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

First some background.
I have inherited a system that appears to have a lot of logic built into
views upon views upon views (and then some more views for good measure).
It struck me that the CASE conditions built into those views are causing
poorer performance than expected – so I thought I would run a few tests
against the base tables see where the difference lies.

Anyway, that’s all by the by.. Because what I found on my travels is that
the parent table of the relevant partitions is being included and appended
in the query plan.
This is all documented and I understand why, fine. But the impact of this
is greater than I was expecting.

For instance, if I query the partition directly (for all tuples it contains)
versus a query that targets the same partition via exclusion rules - I find
the direct query runs in less than half the time.

Direct query:
Seq Scan on partitioned.ts_201405 track_streams (cost=0.00..4167467.56
rows=65067252 width=253) (actual time=0.010..96796.053 rows=65328073
loops=1)
Output:
Filter:
Buffers: shared hit=354 read=2215096
Total runtime: 137437.675 ms
(5 rows)

Indirect query:
Result (cost=0.00..4167467.56 rows=65067253 width=253) (actual
time=0.011..250057.941 rows=65328073 loops=1)
Output:
Buffers: shared hit=322 read=2215128
-> Append (cost=0.00..4167467.56 rows=65067253 width=253) (actual
time=0.010..163452.326 rows=65328073 loops=1)
Buffers: shared hit=322 read=2215128
-> Seq Scan on archive.ts (cost=0.00..0.00 rows=1 width=199)
(actual time=0.001..0.001 rows=0 loops=1)
Output:
Filter:
-> Seq Scan on partitioned.ts_201405 (cost=0.00..4167467.56
rows=65067252 width=253) (actual time=0.006..85883.925 rows=65328073
loops=1)
Output:
Filter:
Buffers: shared hit=322 read=2215128
Total runtime: 289238.187 ms
(13 rows)

So what is the append node actually doing, and why is it necessary?
I expect that it simply does what it says, and appends the results of those
two seq-scans. But in reality, there isn’t a lot to do there. While I
expect a little bit of overhead, surely it just passes the tuples straight
through to the result node and that will be that.. No?

(yeah, I’ve made a few assumptions/guesses here, but I’m not sure I’m ready
to look at the code just yet)

Cheers,

Tim

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tim Kane 2014-05-13 20:08:09 Adaptive query execution
Previous Message Jeff Janes 2014-05-13 17:12:46 Re: Specifications for a new server