basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE
Date: 2024-06-24 14:12:38
Message-ID: bf9bc731-5cd4-42b0-879b-57d6be2204ac@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While doing some additional testing of (incremental) backups, I ran into
a couple regular failures. After pulling my hair for a couple days, I
realized the issue seems to affect regular backups, and incremental
backups (which I've been trying to test) are likely innocent.

I'm using a simple (and admittedly not very pretty) bash scripts that
takes and verified backups, concurrently with this workload:

1) initialize a cluster

2) initialize pgbench in database 'db'

3) run short pgbench on 'db'

4) maybe do vacuum [full] on 'db'

5) drop a database 'db_copy' if it exists

6) create a database 'db_copy' by copying 'db' using one of the
available strategies (file_copy, wal_log)

7) run short pgbench on 'db_copy'

8) maybe do vacuum [full] on 'db_copy'

And concurrently with this, it takes a basebackup, starts a cluster on
it (on a different port, ofc), and does various checks on that:

a) verify checksums using pg_checksums (cluster has them enabled)

b) run amcheck on tables/indexes on both databases

c) SQL check (we expect all tables to be 'consistent' as if we did a
PITR - in particular sum(balance) is expected to be the same value on
all pgbench tables) on both databases

I believe those are reasonable expectations - that we get a database
with valid checksums, with non-broken tables/indexes, and that the
database looks as a snapshot taken at a single instant.

Unfortunately it doesn't take long for the tests to start failing with
various strange symptoms on the db_copy database (I'm yet to see an
issue on the 'db' database):

i) amcheck fails with 'heap tuple lacks matching index tuple'

ERROR: heap tuple (116195,22) from table "pgbench_accounts" lacks
matching index tuple within index "pgbench_accounts_pkey"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

I've seen this with other tables/indexes too, e.g. system catalogs
pg_statitics or toast tables, but 'accounts' is most common.

ii) amcheck fails with 'could not open file'

ERROR: could not open file "base/18121/18137": No such file or
directory
LINE 9: lateral verify_heapam(relation => c.oid, on_error_stop =>
f...
^
ERROR: could not open file "base/18121/18137": No such file or
directory

iii) failures in the SQL check, with different tables have different
balance sums

SQL check fails (db_copy) (account 156142 branches 136132 tellers
136132 history -42826)

Sometimes this is preceded by amcheck issue, but not always.

I guess this is not the behavior we expect :-(

I've reproduced all of this on PG16 - I haven't tried with older
releases, but I have no reason to assume pre-16 releases are not affected.

With incremental backups I've observed a couple more symptoms, but those
are most likely just fallout of this - not realizing the initial state
is a bit wrong, and making it worse by applying the increments.

The important observation is that this only happens if a database is
created while the backup is running, and that it only happens with the
FILE_COPY strategy - I've never seen this with WAL_LOG (which is the
default since PG15).

I don't recall any reports of similar issues from pre-15 releases, where
FILE_COPY was the only available option - I'm not sure why is that.
Either it didn't have this issue back then, or maybe people happen to
not create databases concurrently with a backup very often. It's a race
condition / timing issue, essentially.

I have no ambition to investigate this part of the code much deeper, or
invent a fix myself, at least not in foreseeable future. But it seems
like something we probably should fix - subtly broken backups are not a
great thing.

I see there have been a couple threads proposing various improvements to
FILE_COPY, that might make it more efficient/faster, namely using the
filesystem cloning [1] or switching pg_upgrade to use it [2]. But having
something that's (maybe) faster but not quite correct does not seem like
a winning strategy to me ...

Alternatively, if we don't have clear desire to fix it, maybe the right
solution would be get rid of it?

regards

[1]
https://www.postgresql.org/message-id/CA+hUKGLM+t+SwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w@mail.gmail.com

[2] https://www.postgresql.org/message-id/Zl9ta3FtgdjizkJ5%40nathan

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
basebackup-test-scripts.tgz application/x-compressed-tar 13.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-06-24 14:36:04 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Jelte Fennema-Nio 2024-06-24 13:18:56 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs