Re: [PoC] Partition path cache

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: Bykov Ivan <I(dot)Bykov(at)modernsys(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Partition path cache
Date: 2024-10-24 23:51:36
Message-ID: 87seslqctz.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bykov Ivan <I(dot)Bykov(at)modernsys(dot)ru> writes:

> Our customers use databases (1-10 TB) with big tables. Often these tables are
> big and split on sections. For example, we have tables with almost
> thousand sections. In most cases, those sections have a similar set of indexes
> and contain similar data. Often this partitioned table has multilevel structure
> (for example, a per-year section has a per-quarter section, and a per-quarter
> section in turn has a per-month simple relation sections).
>
> During the analysis of the planning procedure, we found that the planner
> we found that the planner in PostgreSQL 15.7 spents a lot of time building
> access paths.
..
> We backported new access path build algorithms from PostgreSQL 17 (which
> optimizes match_pathkeys_to_index()) and it takes effect:
> planner spent 1090 ms for planning query at first time and 320 ms for
> second time.
>
> But we still think that planners make unnecessary jobs when building all
> types of paths for every section. So we implemented a feature named
> “partition path cache” (see next section), and now planner spent 970 ms for
> planning query at the first time and 240 ms for the second time.

>
> Partition path cache
> ====================
> The partition path cache aims to speed up planning for partition scan paths.
>
> Path cache doesn't copy and transform Path nodes directly due to the absence of
> Path nodes copy functions. The main aim of this patch is to prove the assumption
> that partitions of the same relation that have similar sets of indexes may use
> similar access path types.

This sounds like an interesting idea, I like it because it omit the needs
for "global statistics" effort for partitioned table since it just use
the first partition it knows. Of couse it has its drawback that "first"
partition can't represent other partitions.

One of the Arguments of this patch might be "What if other partitions
have a pretty different statistics from the first partition?". If I were
you, I might check all the used statistics on this stage and try to find
out a similar algorithms to prove that the best path would be similar
too. This can happens once when the statistics is gathered. However this
might be not easy.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2024-10-25 00:01:11 Re: Changing the default random_page_cost value
Previous Message Michael Paquier 2024-10-24 22:59:50 Re: Refactor to use common function 'get_publications_str'.