From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY |
Date: | 2024-02-23 15:17:52 |
Message-ID: | CA+Tgmob0xa=ByvGLMdAgkUZyVQE=r4nyYZ_VEa40FCfEDFnTKA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
If XLOG_DBASE_CREATE_FILE_COPY occurs between an incremental backup
and its reference backup, every relation whose DB OID and tablespace
OID match the corresponding values in that record should be backed up
in full. Currently that's not happening, because the WAL summarizer
doesn't see the XLOG_DBASE_CREATE_FILE_COPY as referencing any
particular relfilenode and so basically ignores it. The same happens
for XLOG_DBASE_CREATE_WAL_LOG, but that case is OK because that only
covers creating the directory itself, not anything underneath it, and
there will be separate WAL records telling us the relfilenodes created
below the new directory and the pages modified therein.
AFAICS, fixing this requires some way of noting in the WAL summary
file that an entire directory got blown away. I chose to do that by
setting the limit block to 0 for a fake relation with the given DB OID
and TS OID and relfilenumber 0, which seems natural. Patch with test
case attached. The test case in brief is:
initdb -c summarize_wal=on
# start the server in $PGDATA
psql -c 'create database lakh oid = 100000 strategy = file_copy' postgres
psql -c 'create table t1 (a int)' lakh
pg_basebackup -cfast -Dt1
dropdb lakh
psql -c 'create database lakh oid = 100000 strategy = file_copy' postgres
pg_basebackup -cfast -Dt2 --incremental t1/backup_manifest
pg_combinebackup t1 t2 -o result
# stop the server, restart from the result directory
psql -c 'select * from t1' lakh
Without this patch, you get something like:
ERROR: could not open file "base/100000/16388": No such file or directory
...because the catalog entries from before the database is dropped and
recreated manage to end up in pg_combinebackup's output directory,
which they should not.
With the patch, you correctly get an error about t1 not existing.
I thought about whether there were any other WAL records that have
similar problems to XLOG_DBASE_CREATE_FILE_COPY and didn't come up
with anything. If anyone knows of any similar cases, please let me
know.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Fix-incremental-backup-interaction-with-XLOG_DBAS.patch | application/octet-stream | 8.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-02-23 15:19:25 | Re: RangeTblEntry.inh vs. RTE_SUBQUERY |
Previous Message | Tom Lane | 2024-02-23 15:15:44 | Re: Relation bulk write facility |