Re: [PoC] Reducing planning time when tables have many partitions

From: Yuya Watari <watari(dot)yuya(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>, Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Thom Brown <thom(at)linux(dot)com>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>
Subject: Re: [PoC] Reducing planning time when tables have many partitions
Date: 2025-04-04 08:42:51
Message-ID: CAJ2pMka+FO=XFW9FC01AUFZA5gXE1e_6=6sdpt7khD-7NSHrMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello David,

Thank you very much for your continuous contributions to this patch
series, and especially for providing these new patches despite the
time constraints.

On Fri, Apr 4, 2025 at 3:04 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 4 Apr 2025 at 00:34, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I've attached 2 patches, which I think addresses most of this, aside
> > from the last point.
> >
> > These do need more work. I've just attached what I have so far before
> > I head off for the day. I am planning on running some performance
> > tests tomorrow and doing a round on the comments.
>
> I've done some further work on this, mostly relating to the code
> comments. I also removed the now-empty
> dispose_eclass_member_iterator() function.

I agree with this new approach. It significantly simplifies the
overall architecture of the patch series while still maintaining
excellent performance. Thank you again for your effort here.

> A couple of things which I'm still uncertain of:
>
> 1. How to handle the ec_childmembers array in _outEquivalenceClass().
> There's no field to know the size of the array. Maybe I should add one
> and then print out the non-empty lists.

I'm also not certain about the best solution here. As you suggested,
adding a field representing the array size to EquivalenceClass seems
like a reasonable approach.

> 2. When processing RELOPT_OTHER_JOINREL in add_child_eq_member(), I'm
> adding the member to each List for all individual relid mentioned in
> child_relids. This will result in the member going on multiple Lists
> and cause the iterator to possibly return the member multiple times.
> That might matter in a few places, e.g.
> generate_join_implied_equalities_normal() keeps some scoring based on
> the number of members.
>
> For #2, Yuya's Bitmapset approach didn't suffer from this issue as the
> Bitmapsets would be unioned to get the non-duplicative members. I
> wondered about doing list_append_unique() instead of lappend() in
> generate_join_implied_equalities_normal(). Unsure. The only other
> thing I can think of is to do something else with members for
> RELOPT_OTHER_JOINREL and store them elsewhere.

Another approach I have in mind is adding an iterator pointer to each
EquivalenceMember to track the iterator that last returned each
member. When the iterator is about to return a member, it would first
check if that member's iterator pointer matches the current iterator.
If it does, we know this member has already been returned, so we skip
it. However, this approach does not work when iterators are called
recursively and leads to increased complexity in the data structure.
Your proposed solution using list_append_unique() instead of lappend()
seems practical since the number of EquivalenceMembers handled in
generate_join_implied_equalities_normal() is usually limited.

> I also did some benchmarking using the attached script. I've attached
> the results of running that on my AMD Zen2 machine. See the end of the
> script for the CREATE TABLE statement for loading that into postgres.
>
> The results look pretty good. v37 came out slightly faster than v36,
> either noise or because of dispose_eclass_member_iterator() removal.

Thank you for running your benchmarks as well. Your results look
promising, demonstrating both reduced planning time and lower memory
consumption.

I have also conducted benchmarks using queries A and B, which I have
used previously and are in [1]. Here is a quick summary:

* The new patch (v37) shows better performance improvements compared
to previous versions (v35 and v36).
* The performance gains are significant and worth committing.
* Performance regressions are negligible or non-existent, even with a
small number of partitions.
* Memory usage in v37 is lower than v35 and almost identical to the master.

Detailed results are as follows:

The following tables and attached figures indicate that v37 achieves
up to 415.4% and 280.3% speedups for queries A and B, respectively.
These improvements are better than those seen in v35 and v36.

Importantly, v37 does not appear to introduce any regressions. Its
speedups exceeded 100% in all tested cases except for the one with two
partitions in query A. Even in that case, the performance remained at
99.9% of the master, demonstrating that the regression is negligible.

Moreover, Table 5 and the attached figure show v37 consumes no
additional memory compared to the master.

Table 1: Planning time for query A (ms)
-------------------------------------------
n | Master | v35 | v36 | v37
-------------------------------------------
1 | 0.274 | 0.273 | 0.274 | 0.270
2 | 0.285 | 0.288 | 0.286 | 0.286
4 | 0.381 | 0.378 | 0.368 | 0.372
8 | 0.477 | 0.468 | 0.471 | 0.471
16 | 0.698 | 0.671 | 0.667 | 0.650
32 | 1.251 | 1.190 | 1.169 | 1.149
64 | 2.848 | 2.550 | 2.463 | 2.444
128 | 6.051 | 4.692 | 4.669 | 4.588
256 | 16.812 | 10.851 | 10.784 | 10.742
384 | 30.985 | 16.640 | 16.354 | 16.243
512 | 50.548 | 23.174 | 22.981 | 22.940
640 | 72.046 | 28.725 | 28.679 | 28.296
768 | 102.668 | 34.975 | 34.759 | 34.280
896 | 150.563 | 46.764 | 46.313 | 46.006
1024 | 197.559 | 48.243 | 47.777 | 47.553
-------------------------------------------

Table 2: Speedup of query A (higher is better)
---------------------------------
n | v35 | v36 | v37
---------------------------------
1 | 100.6% | 100.2% | 101.5%
2 | 99.2% | 99.9% | 99.9%
4 | 100.6% | 103.3% | 102.3%
8 | 101.8% | 101.2% | 101.2%
16 | 104.0% | 104.6% | 107.4%
32 | 105.1% | 107.0% | 108.9%
64 | 111.7% | 115.6% | 116.5%
128 | 129.0% | 129.6% | 131.9%
256 | 154.9% | 155.9% | 156.5%
384 | 186.2% | 189.5% | 190.8%
512 | 218.1% | 220.0% | 220.4%
640 | 250.8% | 251.2% | 254.6%
768 | 293.5% | 295.4% | 299.5%
896 | 322.0% | 325.1% | 327.3%
1024 | 409.5% | 413.5% | 415.4%
---------------------------------

Table 3: Planning time for query B (ms)
------------------------------------------
n | Master | v35 | v36 | v37
------------------------------------------
1 | 12.300 | 12.419 | 12.219 | 12.209
2 | 11.741 | 11.761 | 11.652 | 11.639
4 | 12.573 | 12.376 | 12.390 | 12.418
8 | 13.653 | 13.242 | 13.074 | 13.081
16 | 15.693 | 14.717 | 14.503 | 14.416
32 | 20.957 | 17.890 | 17.732 | 17.675
64 | 35.914 | 25.772 | 25.633 | 25.495
128 | 79.154 | 42.826 | 42.441 | 42.407
256 | 243.880 | 88.246 | 87.626 | 87.011
------------------------------------------

Table 4: Speedup of query B (higher is better)
--------------------------------
n | v35 | v36 | v37
--------------------------------
1 | 99.0% | 100.7% | 100.7%
2 | 99.8% | 100.8% | 100.9%
4 | 101.6% | 101.5% | 101.2%
8 | 103.1% | 104.4% | 104.4%
16 | 106.6% | 108.2% | 108.9%
32 | 117.1% | 118.2% | 118.6%
64 | 139.4% | 140.1% | 140.9%
128 | 184.8% | 186.5% | 186.7%
256 | 276.4% | 278.3% | 280.3%
--------------------------------

Table 5: Memory usage (MB)
(n: number of partitions per table; PWJ: partition-wise join)
----------------------------------------------------------------
Query | n | PWJ | Master | v35 | v36 | v37
----------------------------------------------------------------
A | 1024 | OFF | 48.138 | 49.606 | 48.341 | 48.341
A | 1024 | ON | 127.483 | 128.952 | 127.687 | 127.687
B | 256 | OFF | 92.507 | 96.882 | 92.632 | 92.632
B | 256 | ON | 5803.316 | 5807.691 | 5803.441 | 5803.441
----------------------------------------------------------------

Again, I greatly appreciate your taking the time to significantly
improve this patch. I'd also like to thank Tom once again for his
valuable feedback, which greatly contributed to these improvements.

[1] https://www.postgresql.org/message-id/CAJ2pMkYcKHFBD_OMUSVyhYSQU0-j9T6NZ0pL6pwbZsUCohWc7Q@mail.gmail.com

--
Best regards,
Yuya Watari

Attachment Content-Type Size
figure.png image/png 171.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2025-04-04 08:46:57 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Daniel Gustafsson 2025-04-04 08:39:44 Re: Making sslrootcert=system work on Windows psql