From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sample rate added to pg_stat_statements |
Date: | 2025-02-19 22:59:02 |
Message-ID: | 6f305fd1-44c3-4189-81ef-238777277e5e@tantorlabs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 19.02.2025 22:11, Sami Imseih wrote:
> I don't see v18 attached.
Ah, sorry, I attached v18.
> But, I think it's wrong that you changed the design of this patch
> (sample rate to duration based ) after it was ready for committer.
I've decided to reconsider how to better reduce the load on the
spin-lock in pg_stat_statements.
If we look at the benchmark with 192 CPUs [0], we can see that even a
small reduction in the frequency of incrementing counter of entries
gives a significant performance boost. For example, at sample_rate =
0.75, performance almost doubles, while at sample_rate < 0.5, the
spin-lock bottleneck is already eliminated. This suggests that even
small adjustments to the frequency can significantly improve the situation.
But instead of blindly reducing the frequency via PRNG, we can take a
more thoughtful approach with threshold by execute time:
* Find the most frequent query by column 'calls' in pg_stat_statements;
* In this query look at info about execution time: min_exec_time,
max_exec_time, etc;
* Gradually increase the threshold from min_exec_time to
max_exec_time, limiting the tracking of this query.
* Monitor performance: once the bottleneck is resolved, stop at the
current threshold value.
This approach allows us to:
* Eliminate the spin-lock bottleneck;
* Preserve data about slow queries, which may be critical for
performance analysis;
* Reduce the load on the most frequent queries causing contention,
instead of uniformly reducing the frequency for all queries.
BTW, I track queries in pgss_ProcessUtility without checking exec time
because these queries unlikely can lead to bottleneck.
Any other thoughts or objections?
>
> This CF [0] should go back to "Needs Review" if this is the case.
Done!
P.S. During the recent review, I noticed an interesting behavior: when
we introduced \parse, changing the 'track' from 'none' to 'all' might
lead to an undefined behavior. The steps are:
postgres=# SELECT pg_stat_statements_reset() IS NOT NULL AS t; t --- t
(1 row) postgres=# SET pg_stat_statements.track = "none"; SET postgres=#
SELECT 1 \parse stmt ?column? ---------- 1 (1 row) postgres=#
\bind_named stmt \g ?column? ---------- 1 (1 row) postgres=# SELECT
query, calls FROM pg_stat_statements; query | calls -------+------- (0
rows) postgres=# SET pg_stat_statements.track = "all"; SET postgres=#
\bind_named stmt \g ?column? ---------- 1 (1 row) postgres=# SELECT
query, calls FROM pg_stat_statements; query | calls ----------+-------
SELECT 1 | 1 (1 row)
As you can see, the query has not been normalized. Is it a bug, expected
or undefined behavior?
-- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
v18-0001-Allow-setting-execution-time-threshold-for-pgss.patch | text/x-patch | 14.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2025-02-19 23:12:21 | Re: BitmapHeapScan streaming read user and prelim refactoring |
Previous Message | Tomas Vondra | 2025-02-19 22:56:02 | Re: Should heapam_estimate_rel_size consider fillfactor? |