Quick Links

Re: Sample rate added to pg_stat_statements

From:	Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To:	Sami Imseih <samimseih(at)gmail(dot)com>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Greg Sabino Mullane <htamfids(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sample rate added to pg_stat_statements
Date:	2025-01-29 18:52:21
Message-ID:	18631d46-1741-4edc-b116-8d9631cdf919@tantorlabs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 28.01.2025 23:50, Ilia Evdokimov wrote:
>
>>
>>> If anyone has the capability to run this benchmark on machines with
>>> more
>>> CPUs or with different queries, it would be nice. I’d appreciate any
>>> suggestions or feedback.
>> I wanted to share some additional benchmarks I ran as well
>> on a r8g.48xlarge ( 192 vCPUs, 1,536 GiB of memory) configured
>> with 16GB of shared_buffers. I also attached the benchmark.sh
>> script used to generate the output.
>> The benchmark is running the select-only pgbench workload,
>> so we have a single heavily contentious entry, which is the
>> worst case.
>>
>> The test shows that the spinlock (SpinDelay waits)
>> becomes an issue at high connection counts and will
>> become worse on larger machines. A sample_rate going from
>> 1 to .75 shows a 60% improvement; but this is on a single
>> contentious entry. Most workloads will likely not see this type
>> of improvement. I also could not really observe
>> this type of difference on smaller machines ( i.e. 32 vCPUs),
>> as expected.
>>
>> ## init
>> pgbench -i -s500
>>
>> ### 192 connections
>> pgbench -c192 -j20 -S -Mprepared -T120 --progress 10
>>
>> sample_rate = 1
>> tps = 484338.769799 (without initial connection time)
>> waits
>> -----
>>    11107 SpinDelay
>>     9568 CPU
>>      929 ClientRead
>>       13 DataFileRead
>>        3 BufferMapping
>>
>> sample_rate = .75
>> tps = 909547.562124 (without initial connection time)
>> waits
>> -----
>>    12079 CPU
>>     4781 SpinDelay
>>     2100 ClientRead
>>
>> sample_rate = .5
>> tps = 1028594.555273 (without initial connection time)
>> waits
>> -----
>>    13253 CPU
>>     3378 ClientRead
>>      174 SpinDelay
>>
>> sample_rate = .25
>> tps = 1019507.126313 (without initial connection time)
>> waits
>> -----
>>    13397 CPU
>>     3423 ClientRead
>>
>> sample_rate = 0
>> tps = 1015425.288538 (without initial connection time)
>> waits
>> -----
>>    13106 CPU
>>     3502 ClientRead
>>
>> ### 32 connections
>> pgbench -c32 -j20 -S -Mprepared -T120 --progress 10
>>
>> sample_rate = 1
>> tps = 620667.049565 (without initial connection time)
>> waits
>> -----
>>     1782 CPU
>>      560 ClientRead
>>
>> sample_rate = .75
>> tps = 620663.131347 (without initial connection time)
>> waits
>> -----
>>     1736 CPU
>>      554 ClientRead
>>
>> sample_rate = .5
>> tps = 624094.688239 (without initial connection time)
>> waits
>> -----
>>     1741 CPU
>>      648 ClientRead
>>
>> sample_rate = .25
>> tps = 628638.538204 (without initial connection time)
>> waits
>> -----
>>     1702 CPU
>>      576 ClientRead
>>
>> sample_rate = 0
>> tps = 630483.464912 (without initial connection time)
>> waits
>> -----
>>     1638 CPU
>>      574 ClientRead
>>
>> Regards,
>>
>> Sami
>
>
> Thank you so much for benchmarking this on a pretty large machine with
> a large number of CPUs. The results look fantastic, and I truly
> appreciate your effort.
>
> BWT, I realized that the 'sampling' test needs to be added not only to
> the Makefile but also to meson.build. I've included that in the v14
> patch.
>
> --
> Best regards,
> Ilia Evdokimov,
> Tantor Labs LLC.

In my opinion, if we can't observe bottleneck of spinlock on 32 CPUs, we
should determine the CPU count at which it becomes. This will help us
understand the scale of the problem. Does this make sense, or are there
really no real workloads where the same query runs on more than 32 CPUs,
and we've been trying to solve a non-existent problem?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

In response to

Re: Sample rate added to pg_stat_statements at 2025-01-28 20:50:48 from Ilia Evdokimov

Responses

Re: Sample rate added to pg_stat_statements at 2025-01-29 21:55:00 from Sami Imseih
Re: Sample rate added to pg_stat_statements at 2025-01-30 10:58:47 from Ilia Evdokimov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2025-01-29 18:56:35	Re: Should heapam_estimate_rel_size consider fillfactor?
Previous Message	Tom Lane	2025-01-29 18:32:05	Re: Fwd: Why we need to check for local buffers in BufferIsExclusiveLocked and BufferIsDirty?