From: | Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com> |
---|---|
To: | Don Seiler <don(at)seiler(dot)us> |
Cc: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-admin <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: WALWriteLocks |
Date: | 2021-05-24 15:38:13 |
Message-ID: | CAM+6J974Opbd5JWVqaSVKesp=qqbu7+7Y=fgNJov5sKy8oczEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Thanks.
For my ref:
Virtual machine and disk performance - Azure Virtual Machines | Microsoft
Docs
<https://docs.microsoft.com/en-us/azure/virtual-machines/disks-performance>
Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ?
if your checkpoint_completion_target is aggressive, it will result in a lot
of io.
if the checkpoint is spread over a larger time window, maybe it will reduce
frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the
iops.
Basics of Tuning Checkpoints - 2ndQuadrant | PostgreSQL
<https://www.2ndquadrant.com/en/blog/basics-of-tuning-checkpoints/>
Tuning Input/Output (I/O) Operations for PostgreSQL | Severalnines
<https://severalnines.com/blog/tuning-io-operations-postgresql>
I did use the above to spread checkpoints to reduce io stress once, but it
meant recovery in case of crash would take longer.
But ok, I guess I am diverging. I'll see if I can simulate the exact
messages.
On Mon, 24 May 2021 at 20:35, Don Seiler <don(at)seiler(dot)us> wrote:
> Not resolved. The WAL was already on SSD, we further moved it to a
> separate SSD (Azure's Ultra SSD) and that seemed to lessen the
> severity/frequency but we still see occasional spikes over the course of
> any given day.
>
> What is actually interesting though is now that I've had iostat running,
> when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on
> the data directory disk, not the WAL disk. See below for an example (first
> two readings for baseline, then spike). sdc is the PG data directory disk
> (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the
> WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching
> per Azure's recommendations, the WAL disk is set to None (per
> recommendations for log disk plus Ultra SSD doesn't allow any caching
> anyway).
>
> 05/12/2021 01:45:47 PM
> avg-cpu: %user %nice %system %iowait %steal %idle
> 17.77 0.00 3.61 1.67 0.00 76.95
>
> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm
> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00
> 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
> sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00
> 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
> sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00
> 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
> dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00
> 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28
>
> 05/12/2021 01:45:52 PM
> avg-cpu: %user %nice %system %iowait %steal %idle
> 18.71 0.00 4.02 1.54 0.00 75.74
>
> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm
> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00
> 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
> sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00
> 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
> sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00
> 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
> dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00
> 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44
>
> 05/12/2021 01:45:57 PM
> avg-cpu: %user %nice %system %iowait %steal %idle
> 17.85 0.00 4.34 10.93 0.00 66.88
>
> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm
> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00
> 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
> sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00
> 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
> sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00
> 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
> dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00
> 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88
>
> 05/12/2021 01:46:02 PM
> avg-cpu: %user %nice %system %iowait %steal %idle
> 19.40 0.00 4.48 2.53 0.00 73.59
>
> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm
> %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
> loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00
> 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
> sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00
> 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
> sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00
> 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
> dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00
> 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64
>
> On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <
> vijaykumarjain(dot)github(at)gmail(dot)com> wrote:
>
>> Hey Don,
>>
>> Was this resolved?
>> We're there any changes after moving wal to ssd ?
>>
>> On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
>> wrote:
>>
>>> On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
>>> > What I want to know is if there are other events that would contribute
>>> > to a WALWriteLock (eg writing to a data file or the postgresql.log
>>> file)
>>> > or is it 100% writing to the WAL file.
>>>
>>> You don't contribute to a lock...
>>>
>>> The documentation says
>>> (
>>> https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE
>>> ):
>>>
>>> > WALWrite Waiting for WAL buffers to be written to disk.
>>>
>>> So it is contention while writing or flushing WAL.
>>> You can read the code for the details.
>>>
>>> Yours,
>>> Laurenz Albe
>>> --
>>> Cybertec | https://www.cybertec-postgresql.com
>>>
>>>
>
> --
> Don Seiler
> www.seiler.us
>
--
Thanks,
Vijay
Mumbai, India
From | Date | Subject | |
---|---|---|---|
Next Message | Don Seiler | 2021-05-24 15:51:38 | Re: WALWriteLocks |
Previous Message | Don Seiler | 2021-05-24 15:05:30 | Re: WALWriteLocks |