Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: jesper(dot)pedersen(at)redhat(dot)com
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2017-11-29 01:24:08
Message-ID: 3c2dd040-b5bc-0aac-f29d-7e65e2e0b4f3@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jesper.

On 2017/11/28 3:30, Jesper Pedersen wrote:
> Hi Amit,
>
> On 11/24/2017 12:00 AM, Amit Langote wrote:
>>> On 2017/11/23 3:56, Jesper Pedersen wrote:
>>> EXPLAIN (ANALYZE) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER JOIN t2 ON
>>> t2.c = t1.b WHERE t2.d = 1;
>>>
>>> I just wanted to highlight that the "JOIN ON" partition isn't pruned - the
>>> "WHERE" one is.
>>
>> Did you mean to write ON t2.d = t1.b?  If so, equivalence class mechanism
>> will give rise to a t1.b = 1 and hence help prune t1's partition as well:
>>
>
> No, I meant 't2.c = t1.b'. If you take the same example, but don't
> partition you will get the following plan:
>
> test=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.b, t2.c, t2.d FROM t1 INNER
> JOIN t2 ON t2.c = t1.b WHERE t2.d = 1;
>                   QUERY PLAN
> ----------------------------------------------
>  Nested Loop
>    ->  Index Scan using idx_t2_d on t2
>          Index Cond: (d = 1)
>    ->  Index Only Scan using idx_t1_b_a on t1
>          Index Cond: (b = t2.c)
> (5 rows)

So it appears to me that you're pointing out the inner Index Only Scan on
t1, which is lot better than scanning all of t1 on every loop iteration.

As you might know, we can't exactly have the index scan on partitioned
table (that is, the parent table itself), because there wouldn't be any
index on it. However, the planner is smart enough to push the clause down
to partitions (leaf tables) which may have the index and hence index scan
could be chosen for them. But note that planner will have chosen *all*
partitions, because there is no constant value to prune partitions with at
that point.

If we get run-time pruning [1], we get to get almost close to what happens
in the non-partitioned case. In this case, since t1.b of t2.c = t1.b is
the partition key of t1, we will make an Append node with run-time pruning
enabled. On every loop iteration, t2.c's value will be used to prune
useless partitions, which will leave us in most cases to scan just one
partition and it might be an Index Only Scan using the partition's index.

> Maybe "5.10.2. Declarative Partitioning" could be expanded to include some
> general "guidelines" of where partition based plans should be checked
> against their non-partition counterparts (at least the first bullet in
> 5.10.1 says ".. in certain situations .."). Probably a separate patch from
> this.

I agree about shedding more light on that in the documentation. I will
try to write up a patch someday.

Thanks,
Amit

[1] https://commitfest.postgresql.org/15/1330/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-11-29 01:37:07 Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Previous Message Masahiko Sawada 2017-11-29 01:14:11 Re: Fix a typo in xact.c