Quick Links

Better understanding checkpoint logs

From:	Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To:	pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject:	Better understanding checkpoint logs
Date:	2023-06-22 01:53:25
Message-ID:	CAOC+FBXfEhAuywFrGin+c87FedzNAivrqxYJ9k7NnT3CrEGMAw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Hi all. Trying to asses our WAL checkpoint during heavy ETL loads.
Checkpoint timer is at 5min and it seems to be constantly checkpointing,
maybe that's OK. We see an increased % of buffer percentages in the logs
during the checkpoint. Here's a run:

| 2023-06-21 12:20:02.000 | 2023-06-21 12:20:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 88523 buffers (4.4%); 0 WAL file(s) added, 0
removed, 29 recycled; write=268.929 s, sync=0.288 s, total=269.395 s; sync
files=177, longest=0.178 s, average=0.002 s; distance=2368871 kB,
estimate=4720836 kB |
| 2023-06-21 12:15:32.000 | 2023-06-21 12:15:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:15:02.000 | 2023-06-21 12:15:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 63434 buffers (3.1%); 0 WAL file(s) added, 0
removed, 43 recycled; write=269.279 s, sync=0.676 s, total=270.448 s; sync
files=220, longest=0.487 s, average=0.004 s; distance=2830955 kB,
estimate=4982166 kB |
| 2023-06-21 12:10:32.000 | 2023-06-21 12:10:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:10:02.000 | 2023-06-21 12:10:02 UTC::@:[396]:LOG:
checkpoint complete: wrote 42229 buffers (2.1%); 0 WAL file(s) added, 0
removed, 45 recycled; write=269.272 s, sync=0.167 s, total=269.642 s; sync
files=224, longest=0.131 s, average=0.001 s; distance=2923597 kB,
estimate=5221189 kB |
| 2023-06-21 12:05:32.000 | 2023-06-21 12:05:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 12:05:03.000 | 2023-06-21 12:05:03 UTC::@:[396]:LOG:
checkpoint complete: wrote 249580 buffers (12.4%); 0 WAL file(s) added, 0
removed, 79 recycled; write=269.868 s, sync=1.333 s, total=271.434 s; sync
files=256, longest=0.375 s, average=0.006 s; distance=5209546 kB,
estimate=5476477 kB |
| 2023-06-21 12:00:32.000 | 2023-06-21 12:00:32 UTC::@:[396]:LOG:
checkpoint starting: time |
| 2023-06-21 11:59:45.000 | 2023-06-21 11:59:45 UTC::@:[396]:LOG:
checkpoint complete: wrote 357329 buffers (17.7%); 0 WAL file(s) added, 1
removed, 83 recycled; write=249.722 s, sync=2.006 s, total=252.201 s; sync
files=137, longest=0.849 s, average=0.015 s; distance=5504554 kB,
estimate=5506136 kB |
| 2023-06-21 11:55:32.000 | 2023-06-21 11:55:32 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:55:13.000 | 2023-06-21 11:55:13 UTC::@:[396]:LOG:
checkpoint complete: wrote 470167 buffers (23.3%); 0 WAL file(s) added, 0
removed, 84 recycled; write=250.553 s, sync=0.026 s, total=250.705 s; sync
files=193, longest=0.006 s, average=0.001 s; distance=5498807 kB,
estimate=5506312 kB |
| 2023-06-21 11:51:02.000 | 2023-06-21 11:51:02 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:50:44.000 | 2023-06-21 11:50:44 UTC::@:[396]:LOG:
checkpoint complete: wrote 437633 buffers (21.7%); 0 WAL file(s) added, 0
removed, 84 recycled; write=157.932 s, sync=0.005 s, total=158.886 s; sync
files=197, longest=0.004 s, average=0.001 s; distance=5507145 kB,
estimate=5507145 kB |
| 2023-06-21 11:48:05.000 | 2023-06-21 11:48:05 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:47:57.000 | 2023-06-21 11:47:57 UTC::@:[396]:LOG:
checkpoint complete: wrote 338877 buffers (16.8%); 0 WAL file(s) added, 0
removed, 84 recycled; write=240.248 s, sync=0.042 s, total=240.928 s; sync
files=164, longest=0.020 s, average=0.001 s; distance=5453945 kB,
estimate=5453945 kB |
| 2023-06-21 11:43:56.000 | 2023-06-21 11:43:56 UTC::@:[396]:LOG:
checkpoint starting: wal |
| 2023-06-21 11:43:32.000 | 2023-06-21 11:43:32 UTC::@:[396]:LOG:
checkpoint complete: wrote 712593 buffers (35.3%); 0 WAL file(s) added, 15
removed, 42 recycled; write=269.229 s, sync=0.004 s, total=269.362 s; sync
files=69, longest=0.003 s, average=0.001 s; distance=3454836 kB,
estimate=3625475 kB |
| 2023-06-21 11:39:02.000 | 2023-06-21 11:39:02 UTC::@:[396]:LOG:
checkpoint starting: time |

I don't want to read too much into it but I am trying to understand it
better to consider any potential changes in max WAL sizes, or increasing
our checkpoint timeouts if they are causing the ETL stuff to be slower than
it should. Our wal_level is currently set to logical but we plan to change
this to replica soon as it feels like overkill.

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

Responses

Re: Better understanding checkpoint logs at 2023-06-22 06:37:29 from Laurenz Albe

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Laurenz Albe	2023-06-22 06:37:29	Re: Better understanding checkpoint logs
Previous Message	Jeff Janes	2023-06-21 21:03:41	Re: Poor performance due to parallel seq scan on indexed date field