Re: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE
Date: 2024-06-24 15:14:01
Message-ID: ZnmNOV2pm7b5N3zm@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2024 at 04:12:38PM +0200, Tomas Vondra wrote:
> The important observation is that this only happens if a database is
> created while the backup is running, and that it only happens with the
> FILE_COPY strategy - I've never seen this with WAL_LOG (which is the
> default since PG15).

My first thought is that this sounds related to the large comment in
CreateDatabaseUsingFileCopy():

/*
* We force a checkpoint before committing. This effectively means that
* committed XLOG_DBASE_CREATE_FILE_COPY operations will never need to be
* replayed (at least not in ordinary crash recovery; we still have to
* make the XLOG entry for the benefit of PITR operations). This avoids
* two nasty scenarios:
*
* #1: When PITR is off, we don't XLOG the contents of newly created
* indexes; therefore the drop-and-recreate-whole-directory behavior of
* DBASE_CREATE replay would lose such indexes.
*
* #2: Since we have to recopy the source database during DBASE_CREATE
* replay, we run the risk of copying changes in it that were committed
* after the original CREATE DATABASE command but before the system crash
* that led to the replay. This is at least unexpected and at worst could
* lead to inconsistencies, eg duplicate table names.
*
* (Both of these were real bugs in releases 8.0 through 8.0.3.)
*
* In PITR replay, the first of these isn't an issue, and the second is
* only a risk if the CREATE DATABASE and subsequent template database
* change both occur while a base backup is being taken. There doesn't
* seem to be much we can do about that except document it as a
* limitation.
*
* See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE
* strategy that avoids these problems.
*/

> I don't recall any reports of similar issues from pre-15 releases, where
> FILE_COPY was the only available option - I'm not sure why is that.
> Either it didn't have this issue back then, or maybe people happen to
> not create databases concurrently with a backup very often. It's a race
> condition / timing issue, essentially.

If it requires concurrent activity on the template database, I wouldn't be
surprised at all that this is rare.

> I see there have been a couple threads proposing various improvements to
> FILE_COPY, that might make it more efficient/faster, namely using the
> filesystem cloning [1] or switching pg_upgrade to use it [2]. But having
> something that's (maybe) faster but not quite correct does not seem like
> a winning strategy to me ...
>
> Alternatively, if we don't have clear desire to fix it, maybe the right
> solution would be get rid of it?

It would be unfortunate if we couldn't use this for pg_upgrade, especially
if it is unaffected by these problems.

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-06-24 15:14:45 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Previous Message Robert Haas 2024-06-24 15:05:23 Re: scalability bottlenecks with (many) partitions (and more)