Quick Links

Re: Sample rate added to pg_stat_statements

From:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To:	Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sample rate added to pg_stat_statements
Date:	2025-01-22 22:11:55
Message-ID:	0cafe9f5-a5e7-42e4-805f-b182f002ac27@tantorlabs.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 20.01.2025 17:20, Ilia Evdokimov wrote:
>
> On 15.01.2025 20:16, Sami Imseih wrote:
>>> Probably, but first I suggest benchmarking with sampling applied to
>>> all queries. If the results are good, we can later filter certain
>>> queries based on different characteristics.
>> Absolutely. The benchmark numbers to justify this feature are
>> the next step. Thanks for your work on this!
>>
>> Regards,
>>
>> Sami
>>
> Hi hackers,
>
> I’d like to share the results of my benchmark.
>
> To stress the spinlock in pg_stat_statements while incrementing
> counters for popular entries, it's important to use the same query
> repeatedly. To avoid overloading pgss with normalization, the queries
> should not contain constants. I found that using the query 'SELECT
> now()' works best for this purpose.
>
> I ran the benchmark on a machine with 48 CPUs, which may not be
> sufficient to fully test sampling, but I was able to achieve around
> 1.5 million TPS using 'SELECT now()'. To load the CPUs to about
> 85-90%, I ran pgbench with -c 45 -j 45, using a custom 'SELECT now()'
> in pgbench_script.sql file containing 'SELECT now()'. The benchmark
> was conducted on a newly created 'pgbench' database, , with processes
> like autovacuum, fsync, and checkpoints disabled(e.g., checkpoint
> _timeout = '24h').
>
> I tested various 'sample_rate' values and compared the performance
> with pgss both enabled and disabled. The detailed results are attached
> in the 'sample_rate.txt' file, along with pgbench progress reports
> every 10 seconds.
>
>
> pgbench -c 45 -j 45 -T 200 -f pgbench_script.sql --progress=10 pgbench
>
> sample_rate | num of transactions | lat | stddev | tps
>
>         1.0 |    1207127.572580 |0.037| 0.030 | 241400836
>         0.8 |    1403551.516338 |0.032| 0.031 | 280673286
>         0.5 |    1658596.614064 |0.027| 0.012 | 331679344
>         0.2 |    1757747.509476 |0.025| 0.008 | 351507156
>         0.0 |    1760055.986259 |0.025| 0.008 | 351965559
>    pgss off |    1828743.791205 |0.024| 0.008 | 365703762
>
>
> If anyone has the capability to run this benchmark on machines with
> more CPUs or with different queries, it would be nice. I’d appreciate
> any suggestions or feedback.
>
> --.
> Best regards,
> Ilia Evdokimov,
> Tantor Labs LLC.

Hi hackers,

I'd like to share the results of a new benchmark I conducted using the
pgbench tool with the -M prepared -S flags!

Previously, I shared a benchmark focused on extreme high-load scenarios,
but now I’ve conducted a more realistic test case based on Alexander’s
suggestion [0].

Machine Characteristics:

* 48 CPUs
* 374 GB RAM
* 3 TB Disk

To avoid interference from background processes, I disabled autovacuum
and synchronous operations using the following settings in
postgresql.auto.conf:
    autovacuum = off
    fsync = off
    checkpoint_timeout = '24h'
    shared_preload_libraries = 'pg_stat_statements'

Create a test database:
createdb pgbench
pgbench -i -s 3000 pgbench

If pg_stat_statements is enabled, reset its hash table before starting
the benchmark:
psql -c 'SELECT pg_stat_statements_reset()'

Run the benchmark with the following command:
pgbench -c 46 -j 46 -T 200 -M prepared -S --progress=10 pgbench

Here are the results with and without pg_stat_statements, varying the
sample_rate parameter:

sample_rate |       tps | lat | stddev | num of transactions
       1.0 |    1299070.105018 |0.035| 0.055 | 259781895
       0.8 |    1393327.873249 |0.033| 0.039 | 278630669
       0.5 |    1420303.856480 |0.032| 0.037 | 284023396
       0.2 |    1432482.672301 |0.032| 0.037 | 286461834
       0.0 |    1760055.986259 |0.032| 0.036 | 290169584
pgss off |    1460920.805259 |0.031| 0.036 | 292144256

When pg_stat_statements is enabled, TPS decreases by about 10%. However,
by leveraging the new sampling feature (sample_rate), it is possible to
significantly mitigate this overhead. For detailed benchmark results
with the '-M prepared -S' flags, please see the attached file
'sample_rate_prepared.txt'

P.S. Previously, I shared results under extreme high-load conditions
using the query 'SELECT now()'. For clarity, I’m rewriting those results
for better readability:

sample_rate |       tps | lat | stddev | num of transactions
       1.0 |    1207127.572580 |0.037| 0.030 | 241400836
       0.8 |    1403551.516338 |0.032| 0.031 | 280673286
       0.5 |    1658596.614064 |0.027| 0.012 | 331679344
       0.2 |    1757747.509476 |0.025| 0.008 | 351507156
       0.0 |    1760055.986259 |0.025| 0.008 | 351965559
pgss off |    1828743.791205 |0.024| 0.008 | 365703762

Here, we observed a significant ~33% decrease in TPS when pgss was
enabled without sampling.

Any thoughts?

[0]:
https://www.postgresql.org/message-id/CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w%3Dutk7thQcOMNr7Q%40mail.gmail.com

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment	Content-Type	Size
sample_rate_prepared.txt	text/plain	13.5 KB

In response to

Re: Sample rate added to pg_stat_statements at 2025-01-20 14:20:10 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2025-01-22 22:12:35	Re: Update Unicode data to Unicode 16.0.0
Previous Message	Chapman Flack	2025-01-22 21:35:35	Re: XMLDocument (SQL/XML X030)