Better understanding checkpoint logs

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Better understanding checkpoint logs
Date: 2023-06-22 01:53:25
Message-ID: CAOC+FBXfEhAuywFrGin+c87FedzNAivrqxYJ9k7NnT3CrEGMAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi all. Trying to asses our WAL checkpoint during heavy ETL loads.
Checkpoint timer is at 5min and it seems to be constantly checkpointing,
maybe that's OK. We see an increased % of buffer percentages in the logs
during the checkpoint. Here's a run:

| 2023-06-21 12:20:02.000 | 2023-06-21 12:20:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 88523 buffers (4.4%); 0 WAL file(s) added, 0
removed, 29 recycled; write=268.929 s, sync=0.288 s, total=269.395 s; sync
files=177, longest=0.178 s, average=0.002 s; distance=2368871 kB,
estimate=4720836 kB |
| 2023-06-21 12:15:32.000 | 2023-06-21 12:15:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:15:02.000 | 2023-06-21 12:15:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 63434 buffers (3.1%); 0 WAL file(s) added, 0
removed, 43 recycled; write=269.279 s, sync=0.676 s, total=270.448 s; sync
files=220, longest=0.487 s, average=0.004 s; distance=2830955 kB,
estimate=4982166 kB |
| 2023-06-21 12:10:32.000 | 2023-06-21 12:10:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:10:02.000 | 2023-06-21 12:10:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 42229 buffers (2.1%); 0 WAL file(s) added, 0
removed, 45 recycled; write=269.272 s, sync=0.167 s, total=269.642 s; sync
files=224, longest=0.131 s, average=0.001 s; distance=2923597 kB,
estimate=5221189 kB |
| 2023-06-21 12:05:32.000 | 2023-06-21 12:05:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:05:03.000 | 2023-06-21 12:05:03 UTC::@:[396]:LOG:
checkpoint complete: wrote 249580 buffers (12.4%); 0 WAL file(s) added, 0
removed, 79 recycled; write=269.868 s, sync=1.333 s, total=271.434 s; sync
files=256, longest=0.375 s, average=0.006 s; distance=5209546 kB,
estimate=5476477 kB |
| 2023-06-21 12:00:32.000 | 2023-06-21 12:00:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 11:59:45.000 | 2023-06-21 11:59:45 UTC::@:[396]:LOG:
checkpoint complete: wrote 357329 buffers (17.7%); 0 WAL file(s) added, 1
removed, 83 recycled; write=249.722 s, sync=2.006 s, total=252.201 s; sync
files=137, longest=0.849 s, average=0.015 s; distance=5504554 kB,
estimate=5506136 kB |
| 2023-06-21 11:55:32.000 | 2023-06-21 11:55:32 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:55:13.000 | 2023-06-21 11:55:13 UTC::@:[396]:LOG:
checkpoint complete: wrote 470167 buffers (23.3%); 0 WAL file(s) added, 0
removed, 84 recycled; write=250.553 s, sync=0.026 s, total=250.705 s; sync
files=193, longest=0.006 s, average=0.001 s; distance=5498807 kB,
estimate=5506312 kB |
| 2023-06-21 11:51:02.000 | 2023-06-21 11:51:02 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:50:44.000 | 2023-06-21 11:50:44 UTC::@:[396]:LOG:
checkpoint complete: wrote 437633 buffers (21.7%); 0 WAL file(s) added, 0
removed, 84 recycled; write=157.932 s, sync=0.005 s, total=158.886 s; sync
files=197, longest=0.004 s, average=0.001 s; distance=5507145 kB,
estimate=5507145 kB |
| 2023-06-21 11:48:05.000 | 2023-06-21 11:48:05 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:47:57.000 | 2023-06-21 11:47:57 UTC::@:[396]:LOG:
checkpoint complete: wrote 338877 buffers (16.8%); 0 WAL file(s) added, 0
removed, 84 recycled; write=240.248 s, sync=0.042 s, total=240.928 s; sync
files=164, longest=0.020 s, average=0.001 s; distance=5453945 kB,
estimate=5453945 kB |
| 2023-06-21 11:43:56.000 | 2023-06-21 11:43:56 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:43:32.000 | 2023-06-21 11:43:32 UTC::@:[396]:LOG:
checkpoint complete: wrote 712593 buffers (35.3%); 0 WAL file(s) added, 15
removed, 42 recycled; write=269.229 s, sync=0.004 s, total=269.362 s; sync
files=69, longest=0.003 s, average=0.001 s; distance=3454836 kB,
estimate=3625475 kB |
| 2023-06-21 11:39:02.000 | 2023-06-21 11:39:02 UTC::@:[396]:LOG:
checkpoint starting: time |

I don't want to read too much into it but I am trying to understand it
better to consider any potential changes in max WAL sizes, or increasing
our checkpoint timeouts if they are causing the ETL stuff to be slower than
it should. Our wal_level is currently set to logical but we plan to change
this to replica soon as it feels like overkill.

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2023-06-22 06:37:29 Re: Better understanding checkpoint logs
Previous Message Jeff Janes 2023-06-21 21:03:41 Re: Poor performance due to parallel seq scan on indexed date field