Re: Sample rate added to pg_stat_statements

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sample rate added to pg_stat_statements
Date: 2025-01-08 19:39:18
Message-ID: 6cf6270b-e8fa-4d99-b74a-17e781cdf309@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 07.01.2025 22:29, Sami Imseih wrote:
>>> You are right. This is absolutely unexpected for users. Thank you for
>>> the review.
>>>
>>> To fix this, I suggest storing a random number in the [0, 1) range in a
>>> separate variable, which will be used for comparisons in any place. We
>>> will compare 'sample_rate' and random value not only in
>>> pgss_post_parse_analyze(), but also in pgss_ProcessUtility() and
>>> pgss_planner().
>>>
>>> I attached patch with test and fixes.
> I still think this is not the correct approach. It seems that post_parse_analyze
> should not have to deal with checking for a sample rate. This is because
> post_parse_analyze, which is the only hook with access to jstate, is
> only responsible
> for storing a normalized query text on disk and creating a not-yet
> user visible entry
> in the hash. i.e. pgss_store will never increment counters when called from
> pgss_post_parse_analyze.
>
> /* Increment the counts, except when jstate is not NULL */
> if (!jstate)
> {
>
> What I think may be a better idea is to add something similar
> to auto_explain.c, but it should only be added to the top of
> pgss_planner, pgss_ExecutorStart and pgss_ProcessUtility.
>
> if (nesting_level == 0)
> {
> if (!IsParallelWorker())
> current_query_sampled = pg_prng_double(&pg_global_prng_state) <
> pgss_sample_rate;
> else
> current_query_sampled = false;
> }
>
> This is needed for ExecutorStart and not ExecutorEnd because
> ExecutorStart is where the instrumentation is initialized with
> queryDesc->totaltime. The above code block could be
> turned into a macro and reused in the routines mentioned.

I added this code in to the top of pgss_planner, pgss_ExecutorStart and
pgss_ProcessUtility. I removed this checking from
pgss_post_parse_analyze. I attached v9-patch.

>
> However, it seems with this change, we can see a much
> higher likelihood of non-normalized query texts being stored.
> This is because jstate is only available during post_parse_analyze.
> Therefore, if the first time you are sampling the statement is in ExecutorEnd,
> then you will end up storing a non-normalized version of the query text,
> see below example with the attached v8.
>
> postgres=# set pg_stat_statements.sample_rate = 0;
> SET
> postgres=# select pg_stat_statements_reset();
> pg_stat_statements_reset
> -------------------------------
> 2025-01-07 13:02:11.807952-06
> (1 row)
>
> postgres=# SELECT 1 \parse stmt
>
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# SELECT query, calls FROM pg_stat_statements;
> query | calls
> -------+-------
> (0 rows)
>
> postgres=# set pg_stat_statements.sample_rate = 1;
> SET
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# SELECT query, calls FROM pg_stat_statements;
> query | calls
> -------+-------
> (0 rows)
>
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# SELECT query, calls FROM pg_stat_statements;
> query | calls
> ---------------------------------------------+-------
> SELECT 1 | 1
> SELECT query, calls FROM pg_stat_statements | 1
> (2 rows)
>
> postgres=# \bind_named stmt \g
> ?column?
> ----------
> 1
> (1 row)
>
> postgres=# SELECT query, calls FROM pg_stat_statements;
> query | calls
> ---------------------------------------------+-------
> SELECT 1 | 2
> SELECT query, calls FROM pg_stat_statements | 2
> (2 rows)
>
> One idea is to make jstate available to all hooks, and completely
> remove reliance on post_parse_analyze in pg_stat_statements.
> I can't find the thread now, but I know this has come up in past discussions
> when troubleshooting gaps in query normalization. My concern is this
> feature will greatly increase the likelihood of non-normalized query texts.
>
> Also, with regards to the benchmarks, it seems
> sampling will be beneficial for workloads that are touching a small number
> of entries with high concurrency. This is why we see benefit for
> a standard pgbench workload.
> Samping becomes less beneficial when there is a large set of queries
> being updated.
> I still think this is a good approach for workloads that touch a small set
> of entries.
>
> Regards,
>
> Sami
>
>

After the changes in v9-patch, won’t all the necessary queries now be
normalized? Since there are no longer any restrictions in the parser
hook, queries will be normalized as usual, and pgss_planner,
pgss_ExecutorStart, and ExecutorEnd will simply fetch them from
'pgss_query_texts.stat' file.

For now, this seems like the simplest approach instead of making
JumbleState available to other hooks. Moreover, if we do proceed with
that, we might be able to remove the 'pgss_query_texts.stat' file
altogether and more other improvements. In my view, if no other options
arise, we should address making JumbleState available to other hooks in
a separate thread. If you have any suggestions, I'm always open to feedback.

I attached v9 patch with changes and test above.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v9-0001-Allow-setting-sample-ratio-for-pg_stat_statements.patch text/x-patch 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-01-08 19:58:25 Re: using PGOPTIONS in meson
Previous Message Peter Eisentraut 2025-01-08 19:37:12 Re: [PoC] Federated Authn/z with OAUTHBEARER