Re: Sample rate added to pg_stat_statements

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sample rate added to pg_stat_statements
Date: 2024-11-19 12:11:55
Message-ID: 6707FCA9-FD1A-4609-A1A6-142456C14E0C@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 18 Nov 2024, at 23:33, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
>
> Hi hackers,
>
> Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantial overhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statements or adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potential solution to this issue could be query sampling in pg_stat_statements.
>
> A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very useful in high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improved performance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements.
>
> This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viable and applicable.

+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at least it could give an easy way to rule out performance impact of pgss.

> On 19 Nov 2024, at 15:09, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
>
> I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd() ensures that a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the pg_stat_statements.sample_rate criteria. This approach adds flexibility by allowing critical queries to be captured while maintaining efficient sampling.

Is there a reason why pgss_ProcessUtility is excluded?

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-11-19 12:19:15 Re: meson and check-tests
Previous Message Peter Eisentraut 2024-11-19 11:05:29 Re: [PoC] Federated Authn/z with OAUTHBEARER