Very long query planning times for database with lots of partitions

From: Mickael van der Beek <mickael(at)woorank(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Very long query planning times for database with lots of partitions
Date: 2019-01-22 13:44:29
Message-ID: CAEQRsAeU-UHM8EE7dBz+EGWpQkYLK5SgkAiTZK5-zcZSDcWccw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey everyone,

I have a PostgreSQL 10 database that contains two tables which both have
two levels of partitioning (by list and using a single value). Meaning that
a partitioned table gets repartitioned again.

The data in my use case is stored on 5K to 10K partitioned tables (children
and grand-children of the two tables mentioned above) depending on usage
levels.

Three indexes are set on the grand-child partition. The partitioning
columns are not covered by them.
(I don't believe that it is needed to index partition columns no?)

With this setup, I experience queries that have very slow planning times
but fast execution times.
Even for simple queries where only a couple partitions are searched on and
the partition values are hard-coded.

Researching the issue, I thought that the linear search in use by
PostgreSQL 10 to find the partition table metadata was the cause.

cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/

So I decided to try ou PostgreSQL 11 which included the two aforementioned
fixes:

-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Helas, it seems that the version update didn't change anything.
I ran an `ANALYZE` before doing my tests so I believe that the statistics
are calculated and fresh.

Now I know that PostgreSQL doesn't like having lots of partitions but I
still would like to understand why the query planner is so slow in
PostgreSQL 10 and PostgreSQL 11.
(I was also wondering what "a lot" of partitions is in PostgreSQL? When I
look at use cases of extensions like TimescaleDB, I would expect that 5K to
10K partitions wouldn't be a whole lot.)

An example of a simple query that I run on both PostgreSQL version would be:

EXPLAIN ANALYZE
> SELECT
> table_a.a,
> table_b.a
> FROM
> (
> SELECT
> a,
> b
> FROM
> table_a
> WHERE
> partition_level_1_column = 'foo'
> AND
> partition_level_2_column = 'bar'
> )
> AS table_a
> INNER JOIN
> (
> SELECT
> a,
> b
> FROM
> table_b
> WHERE
> partition_level_1_column = 'baz'
> AND
> partition_level_2_column = 'bat'
> )
> AS table_b
> ON table_b.b = table_a.b
> LIMIT
> 10;

Running this query on my database with 5K partitions (split roughly 2/3rds
of the partitions for table_b and 1/3rd of the partitions for table_a) will
return:

- Planning Time: 7155.647 ms
- Execution Time: 2.827 ms

Thank you in advance for your help!

Mickael

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-01-22 14:02:05 Re: Very long query planning times for database with lots of partitions
Previous Message Rangaraj G 2019-01-22 12:54:06 RE: Memory and hard ware calculation :