Re: Increased iowait and blk_read_time with higher shared_buffers

From: Jordan Hurwich <jhurwich(at)pulsasensors(dot)com>
To: Samed YILDIRIM <samed(at)reddoc(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org, Gautam Bellary <gautam(at)pulsasensors(dot)com>
Subject: Re: Increased iowait and blk_read_time with higher shared_buffers
Date: 2022-12-14 18:12:16
Message-ID: CAJKqsjvsuFWoBD7++aPRmdfrr7UmsZbc2Whg6zqd3VQB02FTmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for your thoughtful response Samed.

I'm familiar with the article you linked to, and part of my surprise is
that with these 32GB RAM machines we're seeing better performance at 12.5%
(4GB) than the commonly recommended 25% (8GB) of system memory for
shared_buffers. Your notes about disk read stats from Postgres potentially
actually representing blocks read from the OS cache make sense, I just
imagined that Postgres would be better at managing the memory when it was
dedicated to it via shared_buffers than the OS (obviously with some point
of diminishing returns); and I'm still hoping there's some Postgres
configuration change we can make that enables better performance through
improved utilization of shared_buffers at the commonly recommended 25% of
system memory.

You mentioned effective_cache_size, which we currently have set to 16GB
(50% of system memory). Is it worth us experimenting with that value, if so
would you recommend we try reducing it or increasing it? Are there other
settings that we might consider to see if we can improve the utilization of
shared_buffers at higher values like 8GB (25% of system memory)?

On Wed, Dec 14, 2022 at 4:38 AM Samed YILDIRIM <samed(at)reddoc(dot)net> wrote:

> Hi Jordan,
>
> Increased shared buffer size does not necessarily mean an increased
> performance.
>
> Regarding the negative correlation between IOWait and shared_buffers'
> size; if you don't increase memory of the system, it is an expected result
> in my opinion. Because, PostgreSQL starts reserving a bigger portion of the
> system memory, and the OS cache size decreases respectively. Smaller OS
> cache can easily result with more disk access and higher IO demand and
> bigger IOWait.
>
> As you can see in graphs, when you increase the size of shared_buffers,
> you see higher block hits and lower block reads. "hits" refers to the
> blocks that are already in shared_buffers. "reads" refers to the blocks
> that are not in shared_buffers and *"read from* *disk"*. But, *"read from
> disk"* that you see in PostgreSQL's statistic catalogs doesn't mean all
> of those blocks were read from the disk. PostgreSQL requests data blocks,
> which are not already in shared_buffers, from the kernel. And, if the
> requested block is in the OS cache, the kernel provides it directly from
> the memory. No disk access, therefore, happens at all. And, you observe
> that through lower disk access (I/O) and lower IOWait on your operating
> system.
>
> When you increase size of shared_buffers without increasing amount of the
> system memory and with or without decreasing effective_cache_size,
> PostgreSQL considers the possibility of the block to be requested on the
> memory lower than previous configuration. So, it creates execution plans
> with less index usages. Less index usage means more sequential scan. More
> sequential scan means more disk read. We already have less OS cache. And
> the system has to carry out more disk accesses.
>
> As you can see, they are all connected. Setting shared_buffers higher than
> a threshold, which varies from database to database, actually decreases
> your performance.
>
> To conclude, your results are expected results.
>
> A useful resource to read:
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
>> ..... given the way PostgreSQL also relies on the operating system
>> cache, it's unlikely you'll find using more than 40% of RAM to work better
>> than a smaller amount.
>>
>
> Best regards.
> Samed YILDIRIM
>
>
> On Tue, 13 Dec 2022 at 02:29, Jordan Hurwich <jhurwich(at)pulsasensors(dot)com>
> wrote:
>
>> Hi everyone,
>> I'm writing to ask about a correlation I was surprised to observe on our
>> PSQL machines (particularly read-only standbys) where increasing
>> "shared_buffers" appears to result in
>> increased pg_stat_database.blk_read_time and CPU iowait, which in turn
>> seems to correlate with reduced throughput for our query-heavy services -
>> details below.
>>
>> Is this expected, or are there configuration changes we might make to
>> improve the performance at higher "shared_buffers" values?
>>
>> Thanks, let me know if I can provide any more info,
>> Jordan
>>
>> - Tests and results - public Datadog dashboard here
>> <https://p.datadoghq.com/sb/0d34b3451-8bde042f82c012981b94796cdc26e259>,
>> screenshot attached:
>> - Our beta system ("endor") was run with three different
>> configurations over the ~30hrs from Dec 11 17:00 to Dec 13 0:00 (UTC)
>> - The only changes between these deployments was the
>> "shared_buffers" parameter for all PSQL instances (machine and
>> configuration details below).
>> - "shared_buffers" = "4000MB" - from Dec 10 19:00 to Dec 11
>> 20:00 UTC
>> - "shared_buffers" = "8000MB" - from Dec 11 21:00 to Dec 12
>> 13:30 UTC
>> - "shared_buffers" = "14000MB" - from Dec 12, 14:30 to Dec 13,
>> 0:00 UTC
>> - The datadog dashboard
>> <https://p.datadoghq.com/sb/0d34b3451-8bde042f82c012981b94796cdc26e259>
>> shows our results including cpu divided by usage and the cache hit vs disk
>> read ratio including blk_read_time (additional metrics were enabled at
>> about Dec 11, 10am PST)
>> - Our most query heavy service is our "Trends worker" for which
>> the average worker duration is shown in the top-left graph
>> - We expect the workload to be relatively constant
>> throughout this period, particularly focusing on the standby instances
>> (PQSL2 and PSQL3) where all read-only queries should be sent.
>> - We see the lowest duration, i.e. best performance, most
>> consistently with the lowest setting for shared_buffers, "4000MB"
>> - As we increase shared_buffers we see increased iowait on the
>> standby instances (PSQL2 and PSQL3) and increased blk_read_time
>> (per pg_stat_database), in the bottom-most graphs as "blks_read_time".
>> - Even though we also see a higher ratio of cache hits on
>> those instances. Our graphs show the per second change
>> in pg_stat_database.blks_read abd blks_hit (as "all_hit/s" and
>> "all_read/s") and pg_statio_user_tables.heap_blks_read, heap_blks_hit,
>> idx_blks_read, and idx_blks_hit
>> - Cluster contains 3 PSQL nodes, all on AWS EC2 instances,
>> postgresql.conf attached
>> - Version: psql (PostgreSQL) 14.1
>> - Machine:
>> - AWS "c6gd.4xlarge" (32GB RAM, 16 core 2.5 GHz, local storage
>> 950 GB ssd)
>> - uname -a: Linux ip-172-30-64-110 5.4.0-1038-aws #40-Ubuntu
>> SMP Fri Feb 5 23:53:34 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
>> - Replication via WAL:
>> - Line configuration: PSQL1 (master), PSQL1 followed by PSQL2,
>> PSQL2 followed by PSQL3
>> - Managed by repmgr (version: repmgr 5.3.0), no failovers
>> observed during timeframe of interest
>> - Load balancing:
>> - Managed by PGPool-II (version: 4.3.2 (tamahomeboshi)) on 3
>> AWS instances
>> - All write queries go to master. All read-only queries go to
>> standbys unless WAL on standby > 10MB, falling back to read from master as
>> last resort
>>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2022-12-14 18:27:23 Re: Increased iowait and blk_read_time with higher shared_buffers
Previous Message Samed YILDIRIM 2022-12-14 12:38:10 Re: Increased iowait and blk_read_time with higher shared_buffers