Quadratic planning time for ordered paths over partitioned tables

From: Alexander Kuzmenkov <akuzmenkov(at)timescale(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alexander Korotkov <akorotkov(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, David Rowley <drowley(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Quadratic planning time for ordered paths over partitioned tables
Date: 2025-01-22 15:44:01
Message-ID: CALzhyqzmZ6b9Sbp4jCORJV48t4kz2a1EqR3z4DUxX4RfRYR2xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

There's currently an unfortunate CPU sink in the planning for
partitioned tables. It happens both for the declarative partitioning,
and for the partitioning through inheritance like we use in
TimescaleDB.

The gist of it is that there's a linear search of the EM corresponding
to the given child relation in find_ec_member_matching_pathkeys(). For
a partitioned table, this adds up to a time quadratic in the number of
partitions, which can make some queries unusable after you hit 1k
partitions or so. The usual callers are prepare_sort_from_pathkeys()
or generate_join_implied_equalities().

There's a very simple fix for this, that makes use of the fact that
these EMs are normally looked up in the partitioning order. We can
cache the position of the previously found EM in a static variable in
this function, which makes this search constant-time for the typical
scenarios.

This is arguably a hack, but a simple one. The problem is really
unfortunate and is a subject of many complaints from our customers.
There was another attempt to fix this in [1], but it's been going on
for years without reaching a committable shape. This patch proposes a
quick and dirty alternative that can be adopted until the more
principled solution is worked out.

I did a test with 10k partitions, and this patch makes planning there
3.5 times faster (2.4 s -> 0.7 s):

create table t(x int) partition by range(x);

select format('create table t_%s partition of t for values from (%s)
to (%s)', x, x * 10000, (x + 1) * 10000) from generate_series(0, 9999)
x \gexec

insert into t select generate_series(0, 10000 * 9999);
vacuum analyze t;

set max_parallel_workers_per_gather = 0;
set enable_partitionwise_aggregate = 1;
set enable_hashagg = 0;
insert into t select generate_series(0, 10000 * 9999);

explain select count(*) from t group by x;

The query is supposed to generate a partitionwise aggregation over
sort like this:
Append
-> GroupAggregate
Group Key: t.x
-> Sort
Sort Key: t.x
-> Seq Scan on t_0 t
-> GroupAggregate
Group Key: t_1.x
-> Sort
Sort Key: t_1.x
-> Seq Scan on t_1
...

Would be glad to hear your opinions on this. CCing some committers who
were recently involved with partitioning.

References:
1: https://www.postgresql.org/message-id/flat/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com

---
Alexander Kuzmenkov
Timescale

Attachment Content-Type Size
v1-0001-Fix-quadratic-equivalence-member-search-for-parti.patch text/x-patch 2.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-01-22 15:50:59 Re: Pre-allocating WAL files
Previous Message Shlok Kyal 2025-01-22 15:41:55 Re: create subscription with (origin = none, copy_data = on)