Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables

From: feichanghong <feichanghong(at)qq(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize commit performance with a large number of 'on commit delete rows' temp tables
Date: 2024-07-08 14:17:09
Message-ID: tencent_427567C207B10908071B828A92D8FB638606@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi wenhui,

I carefully analyzed the reason for the performance regression with fewer

temporary tables in the previous patch (v1-0002-): the k_hash_funcs determined

by the bloom_create function were 10(MAX_HASH_FUNCS), which led to an excessive

calculation overhead for the bloom filter.

Based on the calculation formula for the bloom filter, when the number of items

is 100 and k_hash_funcs is 2, the false positive rate for a 1KB bloom filter is

0.0006096; when the number of items is 1000, the false positive rate is

0.048929094. Therefore, k_hash_funcs of 2 can already achieve a decent false

positive rate, while effectively reducing the computational overhead of the

bloom filter.

I have re-implemented a bloom_create_v2 function to create a bloom filter with

a specified number of hash functions and specified memory size.

From the test data below, it can be seen that the new patch in the attachment

(v1-0003-) does not lead to performance regression in any scenario.

Furthermore, the default threshold value can be lowered to 2.

Here is the TPS performance data for different numbers of temporary tables

under different thresholds, as compared with the head (98347b5a). The testing

tool used is pgbench, with the workload being to insert into one temporary

table (when the number of temporary tables is 0, the workload is SELECT 1):

|tablenum&nbsp; &nbsp; &nbsp; |0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |5&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |10 &nbsp; &nbsp; &nbsp; &nbsp; |100&nbsp; &nbsp; &nbsp; &nbsp; |1000 &nbsp; &nbsp; &nbsp; |

|--------------|------------|------------|-----------|-----------|-----------|-----------|-----------|

|head(98347b5a)|39912.722209|10064.306268|9183.871298|7452.071689|5641.487369|1073.203851|114.530958 |

|threshold-2 &nbsp; |40097.047974|10009.598155|9982.172866|9955.651235|9999.338901|9785.626296|8278.828828|

Here is the TPS performance data for different numbers of temporary tables

at a threshold of 2, compared with the head (commit 98347b5a). The testing tool

is pgbench, with the workload being to insert into all temporary tables:

|table num &nbsp; &nbsp; |1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | 5 &nbsp; &nbsp; &nbsp; &nbsp; |10 &nbsp; &nbsp; &nbsp; &nbsp; |100 &nbsp; &nbsp; &nbsp; |1000 &nbsp; &nbsp; |

|--------------|-----------|-----------|-----------|-----------|----------|---------|

|head(98347b5a)|7243.945042|5734.545012|3627.290594|2262.594766|297.856756|27.745808|

|threshold-2 &nbsp; |7289.171381|5740.849676|3626.135510|2207.439931|293.145036|27.020953|

I have previously researched the implementation of the Global Temp Table (GTT)

you mentioned, and it have been used in Alibaba Cloud's PolarDB (Link [1]).

GTT can prevent truncation operations on temporary tables that have not been

accessed by the current session (those not in the OnCommitItem List), but GTT

that have been accessed by the current session still need to be truncated at

commit time.Therefore, GTT also require the optimizations mentioned in the

above patch.

[1] https://www.alibabacloud.com/help/en/polardb/polardb-for-oracle/using-global-temporary-tables?spm=a3c0i.23458820.2359477120.1.66e16e9bUpV7cK

Best Regards,
Fei Changhong

Attachment Content-Type Size
v1-0003-Optimize-commit-with-temp-tables-guc.patch application/octet-stream 10.0 KB
v1-0003-Optimize-commit-with-temp-tables-without-guc.patch application/octet-stream 8.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-08 14:17:49 Re: Injection point locking
Previous Message Bertrand Drouvot 2024-07-08 14:07:58 Re: Pluggable cumulative statistics