Quick Links

Re: Sample rate added to pg_stat_statements

From:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To:	"Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sample rate added to pg_stat_statements
Date:	2024-11-19 14:23:01
Message-ID:	b610758f-0f19-420f-8847-1c01d8792619@tantorlabs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 19.11.2024 15:11, Andrey M. Borodin wrote:

>
>> On 18 Nov 2024, at 23:33, Ilia Evdokimov<ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
>>
>> Hi hackers,
>>
>> Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantial overhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statements or adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potential solution to this issue could be query sampling in pg_stat_statements.
>>
>> A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very useful in high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improved performance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements.
>>
>> This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viable and applicable.
> +1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at least it could give an easy way to rule out performance impact of pgss.
Thank you for review.
>> On 19 Nov 2024, at 15:09, Ilia Evdokimov<ilya(dot)evdokimov(at)tantorlabs(dot)com> wrote:
>>
>> I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd() ensures that a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the pg_stat_statements.sample_rate criteria. This approach adds flexibility by allowing critical queries to be captured while maintaining efficient sampling.
> Is there a reason why pgss_ProcessUtility is excluded?
>
>
> Best regards, Andrey Borodin.
>

Ah, you’re right! Moreover, this check is needed not only in
pgss_ProcessUtility() but in all places where pgss_enabled() is called.
Therefore, it’s better to move the sampling check directly into
pgss_enabled().

However, another issue arises with the initialization of
'current_query_sample' variable that contains whether query is sampling
or not. Initializing it in pgss_ExecutorStart()||is not sufficient,
because pgss_post_parse_analyze() or pgss_ProcessUtility() might be
called earlier. This could lead to different values of
'current_query_sample' being used in these functions, which is
undesirable. Run the regression tests for pg_stat_statements with
initializing in pgss_ExecutorStart(), and you'll see this.

To avoid this, we need to find a function that is called earlier than
all the others. In my opinion, pgss_post_parse_analyze() is a good
candidate for this purpose. If you have objections or better
suggestions, feel free to share them. I attached the patch with fixes.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment	Content-Type	Size
v3-0001-Allow-setting-sample-ratio-for-pg_stat_statements.patch	text/x-patch	3.5 KB

In response to

Re: Sample rate added to pg_stat_statements at 2024-11-19 12:11:55 from Andrey M. Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Greg Sabino Mullane	2024-11-19 14:39:21	Re: Sample rate added to pg_stat_statements
Previous Message	Peter Eisentraut	2024-11-19 13:51:03	Re: Support LIKE with nondeterministic collations