Re: Bug in recovery of drop database?

From: David Steele <david(at)pgbackrest(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Bug in recovery of drop database?
Date: 2025-02-06 19:28:50
Message-ID: c1c2c818-888a-45d8-8994-4ed591cc11ed@pgbackrest.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2/6/25 13:05, Andres Freund wrote:
> On 2025-02-06 15:55:21 +0000, David Steele wrote:
>>
>> Before c66a7d75 it looked like this:
>>
>> FATAL: database "mytest" does not exist
>> DETAIL: The database subdirectory "base/16384" is missing.
>>
>> Which seems reasonable when just looking at the error and not the detail.
>
> Yea, I don't think my commit fundamentally changed things, it just made the
> issue a bit more apparent. The biggest change is that it actually allows to
> "fix" the leaked space from SQL.
>
> To actually fix this problem we'd need to move the dropping of database files
> to happen as part of transaction commit, similar to how relation files aren't
> actually truncated the moment you truncate a table. That's not a trivial
> change...

Understatement.

> WRT your repro:
>> test/pg/bin/psql mytest -t -c 'select now()'
>> 2025-02-05 19:18:26.245167+00
>>
>> test/pg/bin/dropdb mytest
>> test/pg/bin/psql mytest
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL: database "mytest" does not exist
>> ...
>> echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >>
>> test/data/postgresql.auto.conf
>> ...
>> test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts'
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL: cannot connect to invalid database "mytest"
>> HINT: Use DROP DATABASE to drop invalid databases.
>
> I'd guess this is due to using the default recovery_target_inclusive=true.

I got the same result with recovery_target_inclusive=false. I think you
could get it to work with a more precise timestamp or better, xid or lsn.

> The reason:
>
>> However, if I insert this command before the drop database:
>>
>> test/pg/bin/psql mytest -c 'create table test (id int)'
>>
>> Then recovery works fine since it is able to stop before the the drop
>> database. This is an issue on HEAD as well as all current back branches.
>
> works, presumably is that now there's another transaction to stop at
> *after*. Without the additional transaction the first complete transaction
> with the recovery_target_time

Given I'm grabbing the timestamp before the create table I'd expect it
to stop before the table is created whether recovery_target_inclusive is
true or false. I just gave it a try and sure enough either way I get:

$ test/pg/bin/psql mytest -c 'select count(*) from test'
ERROR: relation "test" does not exist

> IOW, while the two-step-drop-database is an issue, even without it, your
> reproducer doesn't seem like it actually would work, even if it were a single
> step, to implement the below scenario:

The repro works almost every time for me on all six active branches. It
hasn't worked a few times, but in those cases I suspect I accidentally
skipped a step. I'm running it manually each time.

>> We have seen this several times now because a favorite test for users is to
>> drop a database and see if they can recover it with PITR. PITR in a test
>> database is always a bit dicey since little/no WAL activity means recovery
>> doesn't have much to go on RE recovery targets, but at the least the
>> database should be fully there or fully not there.
>
> Because the repro actually sets the recovery target to *after* the drop
> database.
>
> Am I misunderstanding something?

Without looking at the code, my guess is the dropping of files WAL
record does not have a transaction timestamp that recovery can use so
recovery blows by the dropping of files and stops before the entry is
removed from pg_database.

It sounds like this is more or less a known issue. If we decide not to
fix it how about I at least document it as a caveat for drop database?

Regards,
-David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Steele 2025-02-07 14:16:39 Re: Bug in recovery of drop database?
Previous Message Andres Freund 2025-02-06 18:05:31 Re: Bug in recovery of drop database?