Re: Partition-wise join for join between (declaratively) partitioned tables

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-08-10 06:23:04
Message-ID: CAFjFpRfa6_n10cn3vXjN9hdTqneH6A1rfnLXy0PnCP63T2putw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 10, 2017 at 9:28 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Aug 10, 2017 at 1:39 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On my computer it took ~1.5 seconds to plan a 1000 partition join,
>> ~7.1 seconds to plan a 2000 partition join, and ~50 seconds to plan a
>> 4000 partition join. I poked around in a profiler a bit and saw that
>> for the 2000 partition case I spent almost half the time in
>> create_plan->...->prepare_sort_from_pathkeys->find_ec_member_for_tle,
>> and about half of that was in bms_is_subset. The other half the time
>> was in query_planner->make_one_rel which spent 2/3 of its time in
>> set_rel_size->add_child_rel_equivalences->bms_overlap and the other
>> 1/3 in standard_join_search.
>
> Ashutosh asked me how I did that. Please see attached. I was
> explaining simple joins like SELECT * FROM foofoo JOIN barbar USING
> (a, b). Here also is the experimental hack I tried when I saw
> bitmapset.c eating my CPU.
>

On my machine I observed following planning times
1000 partitions, without partition-wise join, 100ms; with
partition-wise join 500ms
2000 partitions, without partition-wise join, 320ms; with
partition-wise join 2.2s
4000 partitions, without partition-wise join, 1.3ms; with
partition-wise join 17s

So, even without partition-wise join the planning time increases at a
superlinear rate with the number of partitions.

Your patch didn't improve planning time without partition-wise join,
so it's something good to have along-with partition-wise join. Given
that Bitmapsets are used in other parts of code as well, the
optimization may affect those parts as well, esp. the overhead of
maintaining first_non_empty_wordnum.

The comment at the beginning of the file bitmapset.c says
3 * bitmapset.c
4 * PostgreSQL generic bitmap set package
5 *
6 * A bitmap set can represent any set of nonnegative integers, although
7 * it is mainly intended for sets where the maximum value is not large,
8 * say at most a few hundred.

When we created thousands of children, we have certainly crossed the
few hundred threashold. So, there may be other optimizations possible
there. Probably we should leave that out of partition-wise join
patches. Do you think we solving this problem is a prerequisite for
partition-wise join? Or should we propose that patch as a separate
enhancement?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-08-10 06:38:46 Re: WIP: Failover Slots
Previous Message Michael Paquier 2017-08-10 06:08:36 Re: Timing-sensitive case in src/test/recovery TAP tests