pgsql: Wait for WAL summarization to catch up before creating .partial

From: Robert Haas <rhaas(at)postgresql(dot)org>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Wait for WAL summarization to catch up before creating .partial
Date: 2024-07-26 19:01:45
Message-ID: E1sXQCH-001Lc6-Hd@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Wait for WAL summarization to catch up before creating .partial file.

When a standby is promoted, CleanupAfterArchiveRecovery() may decide
to rename the final WAL file from the old timeline by adding ".partial"
to the name. If WAL summarization is enabled and this file is renamed
before its partial contents are summarized, WAL summarization breaks:
the summarizer gets stuck at that point in the WAL stream and just
errors out.

To fix that, first make the startup process wait for WAL summarization
to catch up before renaming the file. Generally, this should be quick,
and if it's not, the user can shut off summarize_wal and try again.
To make this fix work, also teach the WAL summarizer that after a
promotion has occurred, no more WAL can appear on the previous
timeline: previously, the WAL summarizer wouldn't switch to the new
timeline until we actually started writing WAL there, but that meant
that when the startup process was waiting for the WAL summarizer, it
was waiting for an action that the summarizer wasn't yet prepared to
take.

In the process of fixing these bugs, I realized that the logic to wait
for WAL summarization to catch up was spread out in a way that made
it difficult to reuse properly, so this code refactors things to make
it easier.

Finally, add a test case that would have caught this bug and the
previously-fixed bug that WAL summarization sometimes needs to back up
when the timeline changes.

Discussion: https://postgr.es/m/CA+TgmoZGEsZodXC4f=XZNkAeyuDmWTSkpkjCEOcF19Am0mt_OA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/8a53539bd603e5fe8fa52bdbb7277f6f49724522

Modified Files
--------------
src/backend/access/transam/xlog.c | 33 +++++++
src/backend/backup/basebackup_incremental.c | 90 ++----------------
src/backend/postmaster/walsummarizer.c | 142 +++++++++++++++++++++++-----
src/bin/pg_combinebackup/meson.build | 1 +
src/bin/pg_combinebackup/t/008_promote.pl | 81 ++++++++++++++++
src/include/access/xlog.h | 1 +
src/include/postmaster/walsummarizer.h | 3 +-
7 files changed, 241 insertions(+), 110 deletions(-)

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-26 20:29:13 pgsql: Introduce num_os_semaphores GUC.
Previous Message Fujii Masao 2024-07-26 18:59:53 pgsql: postgres_fdw: Fix bug in connection status check.