Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY
Date: 2024-02-24 04:35:13
Message-ID: 20240224043513.00.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 23, 2024 at 08:47:52PM +0530, Robert Haas wrote:
> If XLOG_DBASE_CREATE_FILE_COPY occurs between an incremental backup
> and its reference backup, every relation whose DB OID and tablespace
> OID match the corresponding values in that record should be backed up
> in full. Currently that's not happening, because the WAL summarizer
> doesn't see the XLOG_DBASE_CREATE_FILE_COPY as referencing any
> particular relfilenode and so basically ignores it. The same happens
> for XLOG_DBASE_CREATE_WAL_LOG, but that case is OK because that only
> covers creating the directory itself, not anything underneath it, and
> there will be separate WAL records telling us the relfilenodes created
> below the new directory and the pages modified therein.

XLOG_DBASE_CREATE_WAL_LOG creates PG_VERSION in addition to creating the
directory. I see your patch covers it.

> I thought about whether there were any other WAL records that have
> similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up
> with anything. If anyone knows of any similar cases, please let me
> know.

Regarding records the summarizer potentially can't ignore that don't deal in
relfilenodes, these come to mind:

XLOG_DBASE_DROP - covered in this thread's patch
XLOG_RELMAP_UPDATE
XLOG_TBLSPC_CREATE
XLOG_TBLSPC_DROP
XLOG_XACT_PREPARE

Also, any record that writes XIDs needs to update nextXid; likewise for other
ID spaces. See the comment at "XLOG stuff" in heap_lock_tuple(). Perhaps you
don't summarize past a checkpoint, making that irrelevant.

If walsummarizer.c handles any of the above, my brief look missed it. I also
didn't find the string "clog" or "slru" anywhere in dc21234 "Add support for
incremental backup", 174c480 "Add a new WAL summarizer process.", or thread
https://postgr.es/m/flat/CA%2BTgmoYOYZfMCyOXFyC-P%2B-mdrZqm5pP2N7S-r0z3_402h9rsA%40mail.gmail.com
"trying again to get incremental backup". I wouldn't be surprised if you
treat clog, pg_filenode.map, and/or 2PC state files as unconditionally
non-incremental, in which case some of the above doesn't need explicit
summarization code. I stopped looking for that logic, though.

> --- a/src/backend/postmaster/walsummarizer.c
> +++ b/src/backend/postmaster/walsummarizer.c

> + * Technically, this special handling is only needed in the case of
> + * XLOG_DBASE_CREATE_FILE_COPY, because that can create a whole bunch
> + * of relation files in a directory without logging anything
> + * specific to each one. If we didn't mark the whole DB OID/TS OID
> + * combination in some way, then a tablespace that was dropped after
s/tablespace/database/ I suspect.
> + * the reference backup and recreated using the FILE_COPY method prior
> + * to the incremental backup would look just like one that was never
> + * touched at all, which would be catastrophic.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-02-24 05:12:05 Re: Removing unneeded self joins
Previous Message Amit Kapila 2024-02-24 01:35:21 Re: Why is subscription/t/031_column_list.pl failing so much?