Re: Sample rate added to pg_stat_statements

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sample rate added to pg_stat_statements
Date: 2025-01-28 02:56:57
Message-ID: 0c7770bd-9750-4200-a472-bdb262583c2f@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 28.01.2025 02:00, Sami Imseih wrote:
> Hi,
>
> I have a few comments about the patch.
>
> 1/
> tests are missing for pg_stat_statements.track=all and with a sample
> rate < 1 applied.
> Can we add tests for this case?

I think I’ve come up with a good idea for implementing this. We can
create a new file, sampling.sql, for regression tests, where we move
written tests in select.out and check test with setting
"pg_stat_statements.track = 'all'" and "pg_stat_statements.sample_rate =
0.5". Additionally, we can create two files in the 'expected' directory:
sampling.out and sampling_1.out where we can verify either all nested
statements are tracked or none of them.

>
> 2/
>
> This declaration:
>
> +static bool current_query_sampled = false;
>
> should be moved near the declaration of nesting_level,
> the other local variable.

+1

>
>
> 3/
> I do not really like the calls to update_current_query_sampled()
> the way there are, and I think the pgss_enabled should encapsulate
> the new sample rate check as well.
>
> My thoughts are:
>
> 1/ change the name of the function from update_current_query_sampled
> to current_query_sampled
>
> Also, the static bool should be called is_query_sampled or something like that.
>
> The function should also allow you to skip the sample rate check, such as if
> called from pgss_post_parse_analyze.
>
> We can also remove the IsParallelWorker() check in this case, since that
> is done in pgss_enabled.
>
> something like this is what I am thinking:
>
> static bool
> current_query_sampled(bool skip)
> {
> if (skip)
> return true;
>
> if (nesting_level == 0)
> {
> is_query_sampled = pgss_sample_rate != 0.0 &&
> (pgss_sample_rate == 1.0 ||
>
> pg_prng_double(&pg_global_prng_state) <= pgss_sample_rate);
> }
>
> return is_query_sampled;
> }
>
> #define pgss_enabled(level, skip_sampling_check) \
> (!IsParallelWorker() && \
> (pgss_track == PGSS_TRACK_ALL || \
> (pgss_track == PGSS_TRACK_TOP && (level) == 0)) && \
> current_query_sampled(skip_sampling_check))
>
> What do you think?

I agree with all of these changes; they make the code more readable.
Adding comments to the current_query_sampled() function will make it
even clearer.

>
> 4/ Now we have pg_stat_statements.track = "off" which is effectively
> the same thing as pg_stat_statements.sample_rate = "0". Does this need
> to be called out in docs?

+1, because we have the same thing in log_statement_sample_rate.

All the changes mentioned above are included in the v13 patch. Since the
patch status is 'Ready for Committer,' I believe it is now better for
upstream inclusion, with improved details in tests and documentation. Do
you have any further suggestions?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Attachment Content-Type Size
v13-0001-Allow-setting-sample-rate-for-pg_stat_statements.patch text/x-patch 13.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2025-01-28 03:01:20 Re: Skip collecting decoded changes of already-aborted transactions
Previous Message jian he 2025-01-28 02:48:25 Re: in BeginCopyTo make materialized view using COPY TO instead of COPY (query).