Quick Links

Re: Sample rate added to pg_stat_statements

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Greg Sabino Mullane <htamfids(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sample rate added to pg_stat_statements
Date:	2024-11-22 06:08:28
Message-ID:	CAPpHfdsTKAQqC3A48-MGQhrhfEamXZPb64w=utk7thQcOMNr7Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:
> > Oh, and a +1 in general to the patch, OP, although it would also be nice to
> > start finding the bottlenecks that cause such performance issues.
>
> FWIW, I'm not eager to integrate this proposal without looking at this
> exact argument in depth.
>
> One piece of it would be to see how much of such "bottlenecks" we
> would be able to get rid of by integrating pg_stat_statements into
> the central pgstats with the custom APIs, without pushing the module
> into core. This means that we would combine the existing hash of pgss
> to shrink to 8 bytes for objid rather than 13 bytes now as the current
> code relies on (toplevel, userid, queryid) for the entry lookup (entry
> removal is sniped with these three values as well, or dshash seq
> scans). The odds of conflicts still still play in our favor even if
> we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck. And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread. But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold. The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase

In response to

Re: Sample rate added to pg_stat_statements at 2024-11-19 22:07:16 from Michael Paquier

Responses

Re: Sample rate added to pg_stat_statements at 2024-11-25 22:15:25 from Ilia Evdokimov
Re: Sample rate added to pg_stat_statements at 2025-01-09 21:16:17 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bertrand Drouvot	2024-11-22 07:49:58	Re: per backend I/O statistics
Previous Message	Zharkov Roman	2024-11-22 03:52:18	Re: Meson rebuilds and reinstalls autoinc and refint libraries during regression tests.