Re: Sample rate added to pg_stat_statements

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sample rate added to pg_stat_statements
Date: 2024-11-21 09:17:22
Message-ID: b589ba2e-606e-4ade-8f42-2096458be8ec@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 20.11.2024 01:07, Michael Paquier wrote:
> On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:
>> Oh, and a +1 in general to the patch, OP, although it would also be nice to
>> start finding the bottlenecks that cause such performance issues.
> FWIW, I'm not eager to integrate this proposal without looking at this
> exact argument in depth.
>
> One piece of it would be to see how much of such "bottlenecks" we
> would be able to get rid of by integrating pg_stat_statements into
> the central pgstats with the custom APIs, without pushing the module
> into core. This means that we would combine the existing hash of pgss
> to shrink to 8 bytes for objid rather than 13 bytes now as the current
> code relies on (toplevel, userid, queryid) for the entry lookup (entry
> removal is sniped with these three values as well, or dshash seq
> scans). The odds of conflicts still still play in our favor even if
> we have a few million entries, or even ten times that.
>
> This would also get rid of the pgss text file for the queries, which
> is a source of one of the bottlenecks, as we could just store query
> strings upper-bounded based on a postmaster GUC to control the size of
> the entries in the pgstats dshash. More normalization for IN and ANY
> clauses would also help a not here, these are a cause of a lot of
> bloat.
>
> This integration is not something I will be able to work on for the
> PG18 dev cycle as I'm in full review/commit mode for the rest of this
> release, but I got some plans for it in PG19 except if somebody beats
> me to it.
> --
> Michael

I agree. Your proposal can indeed improve performance. Currently, I am
working on these changes and will validate them with benchmarks. Once I
have concrete results, I will open new threads to facilitate further
discussion.

However, in my opinion, the suggested improvements are not enough, and
sampling is essential.

1. I agree with Greg that pgss is widely used. It's quite odd that
sampling exists in 'auto_explain' but not in pgss.

2. If performance issues arise even after these improvements and it
turns out that pgss is the cause, the only painless solution without
restarting the instance is sampling. The current pgss's parameters are
not optimal for achieving this.

BTW, I forgot to include a case of nested statements. Either all will be
tracked or none. I attached new version of patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v4-0001-Allow-setting-sample-ratio-for-pg_stat_statements.patch text/x-patch 3.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-11-21 09:33:04 RE: Conflict detection for update_deleted in logical replication
Previous Message Bertrand Drouvot 2024-11-21 08:49:31 Re: Add a write_to_file member to PgStat_KindInfo