Re: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: basebackups seem to have serious issues with FILE_COPY in CREATE DATABASE
Date: 2024-06-24 15:29:42
Message-ID: 792393ae-d9ac-4684-9ef6-5dd4bbb08f55@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6/24/24 17:14, Nathan Bossart wrote:
> On Mon, Jun 24, 2024 at 04:12:38PM +0200, Tomas Vondra wrote:
>> The important observation is that this only happens if a database is
>> created while the backup is running, and that it only happens with the
>> FILE_COPY strategy - I've never seen this with WAL_LOG (which is the
>> default since PG15).
>
> My first thought is that this sounds related to the large comment in
> CreateDatabaseUsingFileCopy():
>
> /*
> * We force a checkpoint before committing. This effectively means that
> * committed XLOG_DBASE_CREATE_FILE_COPY operations will never need to be
> * replayed (at least not in ordinary crash recovery; we still have to
> * make the XLOG entry for the benefit of PITR operations). This avoids
> * two nasty scenarios:
> *
> * #1: When PITR is off, we don't XLOG the contents of newly created
> * indexes; therefore the drop-and-recreate-whole-directory behavior of
> * DBASE_CREATE replay would lose such indexes.
> *
> * #2: Since we have to recopy the source database during DBASE_CREATE
> * replay, we run the risk of copying changes in it that were committed
> * after the original CREATE DATABASE command but before the system crash
> * that led to the replay. This is at least unexpected and at worst could
> * lead to inconsistencies, eg duplicate table names.
> *
> * (Both of these were real bugs in releases 8.0 through 8.0.3.)
> *
> * In PITR replay, the first of these isn't an issue, and the second is
> * only a risk if the CREATE DATABASE and subsequent template database
> * change both occur while a base backup is being taken. There doesn't
> * seem to be much we can do about that except document it as a
> * limitation.
> *
> * See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE
> * strategy that avoids these problems.
> */
>

Perhaps, the mentioned risks certainly seem like it might be related to
the issues I'm observing.

>> I don't recall any reports of similar issues from pre-15 releases, where
>> FILE_COPY was the only available option - I'm not sure why is that.
>> Either it didn't have this issue back then, or maybe people happen to
>> not create databases concurrently with a backup very often. It's a race
>> condition / timing issue, essentially.
>
> If it requires concurrent activity on the template database, I wouldn't be
> surprised at all that this is rare.
>

Right. Although, "concurrent" here means a somewhat different thing.
AFAIK there can't be a any changes concurrent with the CREATE DATABASE
directly, because we make sure there are no connections:

createdb: error: database creation failed: ERROR: source database
"test" is being accessed by other users
DETAIL: There is 1 other session using the database.

But per the comment, it'd be a problem if there is activity after the
database gets copied, but before the backup completes (which is where
the replay will happen).

>> I see there have been a couple threads proposing various improvements to
>> FILE_COPY, that might make it more efficient/faster, namely using the
>> filesystem cloning [1] or switching pg_upgrade to use it [2]. But having
>> something that's (maybe) faster but not quite correct does not seem like
>> a winning strategy to me ...
>>
>> Alternatively, if we don't have clear desire to fix it, maybe the right
>> solution would be get rid of it?
>
> It would be unfortunate if we couldn't use this for pg_upgrade, especially
> if it is unaffected by these problems.
>

Yeah. I wouldn't mind using FILE_COPY in contexts where we know it's
safe, like pg_upgrade. I just don't want to let users to unknowingly
step on this.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-06-24 15:43:59 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Previous Message Robert Haas 2024-06-24 15:28:13 Re: POC, WIP: OR-clause support for indexes