[PoC] Partition path cache

From: Bykov Ivan <I(dot)Bykov(at)modernsys(dot)ru>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: [PoC] Partition path cache
Date: 2024-10-24 08:45:17
Message-ID: d8ca99bb063548e79b2ffe0c9d7c4ee3@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I have a patch that seems to be worth community attention.
It is just a PoC for now, but I wonder if it makes sense to continue
development of the idea.

Preface
=======

My team is developing PostgreSQL fork optimized for high OLTP loads.

Our customers use databases (1-10 TB) with big tables. Often these tables are
big and split on sections. For example, we have tables with almost
thousand sections. In most cases, those sections have a similar set of indexes
and contain similar data. Often this partitioned table has multilevel structure
(for example, a per-year section has a per-quarter section, and a per-quarter
section in turn has a per-month simple relation sections).
During the analysis of the planning procedure, we found that the planner
we found that the planner in PostgreSQL 15.7 spents a lot of time building
access paths.

We also knew that fetching information from Syscache takes time. In our case
on PostgreSQL 15.7, if the planner spent 5.7 sec. for planning the query
at the first time, at the second time it takes 4.4 sec.
So fetching data from pg_catalog is not the main reason for slow planning.

We backported new access path build algorithms from PostgreSQL 17 (which
optimizes match_pathkeys_to_index()) and it takes effect:
planner spent 1090 ms for planning query at first time and 320 ms for
second time.

But we still think that planners make unnecessary jobs when building all
types of paths for every section. So we implemented a feature named
"partition path cache" (see next section), and now planner spent 970 ms for
planning query at the first time and 240 ms for the second time.

The comparison table is given below.

PostgreSQL | first time | second time | execution
version | planning, ms | planning, ms | time, ms
-------------------------------------+---------------+--------------+----------
PostrgeSQL 15.7 | 5713 | 4402 | 55
-------------------------------------+---------------+--------------+----------
PostrgeSQL 15.7 + v17 improvements | 1093 | 320 | 52
-------------------------------------+---------------+--------------+----------
PostrgeSQL 15.7 + v17 improvements + | 967 | 240 | 52
partition path cache | | |

Partition path cache
====================
The partition path cache aims to speed up planning for partition scan paths.

Path cache doesn't copy and transform Path nodes directly due to the absence of
Path nodes copy functions. The main aim of this patch is to prove the assumption
that partitions of the same relation that have similar sets of indexes may use
similar access path types. The cache optimizes access path search using the
following methods:
- Omit building of paths other than selected path types (if sequence scan is
selected, no index or bitmap scan paths will be built).
- Omit checking for indexes that are not involved in the selected path on
index or bitmap access path building.

Partition identification
------------------------

For selection of similar partitions, two new ID types are calculated:
- part_group_id (field of RelOptInfo) - hash sum of parent relation id,
number of parent relation's partitions, and 'index_group_id' of all
indexes. This ID is calculated only for partitions (not for any other regular
relations) at get_relation_info();
- index_group_id (field of IndexOptInfo) - hash sum of fields at corresponding
pg_index row (indnatts, indnkeyatts, indisunique, indnullsnotdistinct,
indisprimary, indisexclusion, indimmediate, indkey, indexprs, indpred).
This ID is calculated only for partition's indexes at get_index_group_id().

The algorithm
-------------

During construction of an access path for a partitioned relation, the cache
recursively performs the following steps:

On filling up a RelOptInfo structure for a partition that is a regular
relation, planner calculates part_group_id for it (and index_group_id for its
indexes).

If the cache has not been allocated yet it will be created on the first time the
Append Path for (root) a partitioned relation be requested (this level of
recursive access path building process marked as level 1 in the cache control
structure).

When planner builds an access path list for partitions, it uses the cache
only if a parent of the current partition has at least
partition_path_cache_threshold partitions. If cache can be used, planner
seeks a cache entry with the same values of part_group_id and rtable index as
the parent has. If the entry not found, planner builds access paths list using
standard algorithm and stores cheapest_total_path properties (pathtype, and list
of index_group_id for involved indexes) at the cache.

If the entry found, planner builds only a short list of paths (with the same
pathtype and short list of indexes with the same index_group_id). Building of
this access path shortcut can fail, resulting in the path not being added
into the cache (reason may be part_group_id/index_group_id collision that is
unlikely). In case the shortcut creation failed, standard algorithm for index
scans is used (for index scan paths only) with full list of available indexes.
If an access path still not found planner uses standard access path building
algorithm and rewrites content of the current entry in the cache.

As soon as the building process of Append Path for the root relation
finished, the cache are deleted (append path build finished of recursive scan
path build marked as level 1 in the cache control structure).

GUC parameters
--------------

- partition_path_cache_threshold: integer, default 0, minimum 0,
maximum INT32_MAX, minimum number of partitions for use the access path
cache, for disabling the feature, set it to 0 or 1.
- partition_path_cache_debug: boolean, default is off, print debug messages
for partition access path cache (used for tests).

Feature improvements
--------------------

The cache may construct an access path for the current partition from scratch
using path from the first partition of a group as a template.
This improvement can be the next step in development of the feature,
but for now it seems too complicated for the PoC solution.

index_group_id value depends only on pg_index row content, so it may be
calculated once at index creation or altering and stored at new pg_index
column (may be we would better to add "key=value" text[] column as
reloptions at pg_class).

Patch
=====
I attached a patch, which may be applied on top of REL_17_0
(d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e)

Path tests
==========

The following table shows test results. Qureries and graphs are attached.

Testing method
--------------

For every combination of partition quantity and binary version query performed
in a single session at a time. 9 sessions run a single query, and then
single session runs 11 queries. First 10 planning times taken to get average
value at column "first plan time," second 10 planning times-for
"second plan time". All 20 execution times used to get average
value at column "execute time".

I took queries from this thread
(thanks Yuya Watari <watari(dot)yuya(at)gmail(dot)com>):
https://www.postgresql.org/message-id/flat/CAJ2pMkZNCgoUKSE%2B_5LthD%2BKbXKvq6h2hQN8Esxpxd%2Bcxmgomg%40mail.gmail.com

+-------------------+-------+----------------------+----------------------+----------------------+
| | part. | first plan time, ms | second plan time, ms | execute time, ms |
| PostgreSQL ver. | qty. | ( diff vs 17.0, %) | ( diff vs 17.0, %) | ( diff vs 17.0, %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 2 | 3.1031 | 0.8080 | 2.4993 |
| 17.0 + path cache | | 3.1228 (+0.63 %) | 0.8010 (-0.87 %) | 2.5468 (+1.90 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 4 | 4.5636 | 1.1161 | 4.8888 |
| 17.0 + path cache | | 4.6028 (+0.86 %) | 1.1105 (-0.50 %) | 4.9687 (+1.63 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 8 | 7.5774 | 1.7725 | 9.6441 |
| 17.0 + path cache | | 7.6790 (+1.34 %) | 1.7703 (-0.12 %) | 9.7673 (+1.28 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 12 | 10.7537 | 2.4675 | 14.3911 |
| 17.0 + path cache | | 10.7554 (+0.02 %) | 2.4729 (+0.22 %) | 14.4567 (+0.46 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
Now we reach partition_path_cache_threshold = 16
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 16 | 13.7407 | 3.1827 | 19.1492 |
| 17.0 + path cache | | 12.4823 (-9.16 %) | 1.8434 (-42.08%) | 19.5624 (+2.16 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 24 | 20.0194 | 4.7433 | 28.6824 |
| 17.0 + path cache | | 17.8342 (-10.92%) | 2.5668 (-45.89%) | 29.7417 (+3.69 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 32 | 26.7856 | 6.5625 | 38.5652 |
| 17.0 + path cache | | 23.2506 (-13.20%) | 3.3558 (-48.86%) | 38.6931 (+0.33 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 48 | 40.1678 | 10.5715 | 57.2730 |
| 17.0 + path cache | | 34.2525 (-14.73%) | 5.1433 (-51.35%) | 57.7580 (+0.85 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 64 | 54.1985 | 15.4603 | 76.8278 |
| 17.0 + path cache | | 45.8584 (-15.39%) | 7.4850 (-51.59%) | 77.8530 (+1.33 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 96 | 84.8551 | 27.4972 | 115.8411 |
| 17.0 + path cache | | 68.7432 (-18.99%) | 13.0090 (-52.69%) | 116.0379 (+0.17 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 128 | 118.6081 | 43.0581 | 155.4576 |
| 17.0 + path cache | | 95.0978 (-19.82%) | 20.0807 (-53.36%) | 158.0958 (+1.70 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 256 | 296.5312 | 149.4680 | 322.6102 |
| 17.0 + path cache | | 218.3300 (-26.37%) | 72.0222 (-51.81%) | 325.5720 (+0.92 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 384 | 544.4308 | 318.3988 | 487.2509 |
| 17.0 + path cache | | 380.6742 (-30.08%) | 159.0805 (-50.04%) | 489.8570 (+0.53 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 512 | 855.6576 | 562.1585 | 699.9267 |
| 17.0 + path cache | | 583.7852 (-31.77%) | 289.3476 (-48.53%) | 692.7862 (-1.02 %) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 768 | 1792.3726 | 1354.0600 | 1179.9674 |
| 17.0 + path cache | | 1296.3144 (-27.68%) | 850.4178 (-37.19%) | 1030.4456 (-12.67%) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 1024 | 3102.6455 | 2496.3959 | 1564.7481 |
| 17.0 + path cache | | 2289.7010 (-26.20%) | 1708.7799 (-31.55%) | 1375.4829 (-12.10%) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 1280 | 4918.5399 | 4093.4833 | 1968.9766 |
| 17.0 + path cache | | 3654.1266 (-25.71%) | 2878.6720 (-29.68%) | 1726.8803 (-12.30%) |
+-------------------+-------+----------------------+----------------------+----------------------+
| 17.0 | 1536 | 7230.7103 | 6405.0386 | 2488.1614 |
| 17.0 + path cache | | 5686.7774 (-21.35%) | 4635.5728 (-27.63%) | 2101.5021 (-15.54%) |
+-------------------+-------+----------------------+----------------------+----------------------+

Attachment Content-Type Size
0001-Partition-path-cache.patch application/octet-stream 88.4 KB
partition-path-cache-lin-2-32.png image/png 91.9 KB
partition-path-cache-lin-2-256.png image/png 104.0 KB
partition-path-cache-lin-2-1536.png image/png 99.1 KB
partition-path-cache-log-2-1536.png image/png 92.5 KB
partition-path-cache-queries.zip application/x-zip-compressed 22 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-10-24 08:56:05 execute prepared statement passing parameter expression with COLLATE clause
Previous Message Laurenz Albe 2024-10-24 08:29:02 Re: proposal: schema variables