Re: Very long query planning times for database with lots of partitions

From: Mickael van der Beek <mickael(at)woorank(dot)com>
To: Steven Winfield <Steven(dot)Winfield(at)cantabcapital(dot)com>, pryzby(at)telsasoft(dot)com
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Very long query planning times for database with lots of partitions
Date: 2019-01-22 15:24:20
Message-ID: CAEQRsAfD9CSCWcS=_K0Pe52j80+HiF69YEUvPtE10KPvbDnFOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank both of you for your quick answers,

@Justin Based on your answer it would seem to confirm that partitioning or
at least partitioning this much is not the correct direction to take.
The reason I originally wanted to use partitioning was that I'm storing a
multi-tenant graph and that as the data grew, so did the indexes and once
they were larger than the available RAM, query performance went down the
drain.
The two levels of partitioning let me create one level for the tenant-level
partitioning and one level for the business logic where I could further
partition the tables into the different types of nodes and edges I was
storing.
(The table_a and table_b in my example query. There is also a table_c which
connect table_a and table_b but I wanted to keep it simple.)
Another reason was that we do regular, automated cleanups of the data and
dropping all the data (hundreds of thousands of rows) for a tenant is very
fast with DROP TABLE of a partition and rather slow with a regular DELETE
query (even if indexed).
With the redesign of the database schema (that included the partitioning
changes), I also dramatically reduced the amounts and size of data per row
on the nodes and edges by storing the large and numerous metadata fields on
separate tables that are not part of the graph traversal process.
Based on the usage number I see, I would expect around 12K tenants in the
medium future which means that even partitioning per tenant on those two
tables would lead to 24K partitions which is way above your approximate
limit of 1K partitions.
Queries are always limited to one tenant's data which was one of the
motivations behind partitioning in the first place.
Not sure what you would advise in this case for a multi-tenant graph?

@Steven, yes, constaint_exclusion is set to the default value of
'partition'.
The EXPLAIN ANALYZE output also successfully prunes the partitions
correctly.
So the query plan looks sounds and the query execution confirms this.
But reaching that point is really what the issue is for me.

On Tue, Jan 22, 2019 at 3:07 PM Steven Winfield <
Steven(dot)Winfield(at)cantabcapital(dot)com> wrote:

> Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?
>
> If so, does the EXPLAIN output mention all of your parent partitions, or
> are some being successfully pruned?
>
> Planning times can be sped up significantly if the planner can exclude
> parent partitions, without ever having to examine the constraints of the
> child (and grandchild) partitions. If this is not the case, take another
> look at your query and try to figure out why the planner might believe a
> parent partition cannot be outright disregarded from the query – does the
> query contain a filter on the parent partitions’ partition key, for example?
>
>
>
> I believe Timescaledb has its own query planner optimisations for
> discarding partitions early at planning time.
>
>
>
> Good luck,
>
> Steve.
>
>
>
> ------------------------------
>
>
> *This email is confidential. If you are not the intended recipient, please
> advise us immediately and delete this message. The registered name of
> Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See -
> http://www.gam.com/en/Legal/Email+disclosures+EU
> <http://www.gam.com/en/Legal/Email+disclosures+EU> for further information
> on confidentiality, the risks of non-secure electronic communication, and
> certain disclosures which we are required to make in accordance with
> applicable legislation and regulations. If you cannot access this link,
> please notify us by reply message and we will send the contents to you.GAM
> Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and
> use information about you in the course of your interactions with us. Full
> details about the data types we collect and what we use this for and your
> related rights is set out in our online privacy policy at
> https://www.gam.com/en/legal/privacy-policy
> <https://www.gam.com/en/legal/privacy-policy>. Please familiarise yourself
> with this policy and check it from time to time for updates as it
> supplements this notice------------------------------ *
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2019-01-22 19:32:46 Re: [HACKERS] proposal: schema variables
Previous Message Steven Winfield 2019-01-22 14:07:38 RE: Very long query planning times for database with lots of partitions