From: | Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sample rate added to pg_stat_statements |
Date: | 2025-02-10 08:05:29 |
Message-ID: | 3546e2a7-5831-4c0a-996b-358d4ec483a6@tantorlabs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
Since current patch is in the commitfest with the status 'Ready for
committer', I’d like to summarize what it does, the problems it
addresses, and answer the key questions raised in the discussion thread.
Enabling pg_stat_statements can cause a performance drop due to two main
reasons:
* Contention on the exclusive lock, which blocks the hash table when
allocating or deallocating an entry.
* Contention on the spinlock, which blocks access to an existing entry.
The exclusive lock issue can be mitigated by normalizing similar queries
to the same queryid. There is an ongoing thread discussing this approach
for IN and ARRAY queries: [0]. Another solution is integrating
pg_stat_statements into central pgstats with the custom APIs, without
pushing the module into core [1].
However, even if we do all that, the spinlock contention problem will
persist for very frequent queries. This is exactly what I aim to solve
with sampling. Other approaches, such as replacing the spinlock with
atomic variables or an lwlock, have not yielded good results [2]. Even
track_planning was disabled by default to reduce contention, but on
high-core systems, the issue still remains.So far, I see no alternative
solution to this problem.
The benchmark results on different number of CPUs demonstrating the
impact of this patch can be found here:
pgbench -c{CPUs} -j20 -S -Mprepared -T120
CPUs | sample_rate | tps | CPU waits | ClientRead wait | SpinDelay wait
192 | 1.0 | 484338| 9568 | 929 | 11107
192 | 0.75 | 909547| 12079 | 2100 | 4781
192 | 0.5 |1028594| 13253 | 3378 | 174
192 | 0.25 |1019507| 13397 | 3423 | -
192 | 0.0 |1015425| 13106 | 3502 | -
48 | 1.0 | 643251| 911 | 932 | 44
48 | 0.75 | 653946| 939 | 939 | 3
48 | 0.5 | 651654| 841 | 932 | -
48 | 0.25 | 652668| 860 | 910 | -
48 | 0.0 | 659111| 849 | 882 | -
32 | 1.0 | 620667| 1782 | 560 | -
32 | 0.75 | 620663| 1736 | 554 | -
32 | 0.5 | 624094| 1741 | 648 | -
32 | 0.25 | 628638| 1702 | 576 | -
32 | 0.0 | 630483| 1638 | 574 | -
Some suggested sampling based on execution time instead of a random
generator. While this is a good idea, it has a critical limitation.
Execution time can only be checked in pgss_ExecutorEnd. However,
pgss_planner already interacts with the spinlock, and we can not sample
based on execution time at that stage. Finding a criterion that works
for both pgss_planner and pgss_ExecutorEnd is difficult. This is why
random sampling remains the most viable option.
BTW, Instead of using a bool flag to check if we are inside
pgss_post_parse_analyze, we now pass a pgssStoreKind to pgss_enabled.
This makes the logic clearer and more readable, explicitly indicating
where we need to check whether a query should be sampled. I made it in
new v15 patch.
Are there any remaining concerns or alternative suggestions regarding
this approach? I'm happy to discuss further.
[0]: https://commitfest.postgresql.org/51/2837/
[1]: https://www.postgresql.org/message-id/Zz0MFPq1s4WFxxpL%40paquier.xyz
[2]: https://commitfest.postgresql.org/29/2634/
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Attachment | Content-Type | Size |
---|---|---|
v15-0001-Allow-setting-sample-rate-for-pg_stat_statements.patch | text/x-patch | 13.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | John Naylor | 2025-02-10 08:18:34 | Re: branch-free tuplesort partitioning |
Previous Message | Richard Guo | 2025-02-10 07:29:19 | Re: should we have a fast-path planning for OLTP starjoins? |