From: | David Steele <david(at)pgbackrest(dot)org> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Bug in recovery of drop database? |
Date: | 2025-02-06 19:28:50 |
Message-ID: | c1c2c818-888a-45d8-8994-4ed591cc11ed@pgbackrest.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 2/6/25 13:05, Andres Freund wrote:
> On 2025-02-06 15:55:21 +0000, David Steele wrote:
>>
>> Before c66a7d75 it looked like this:
>>
>> FATAL: database "mytest" does not exist
>> DETAIL: The database subdirectory "base/16384" is missing.
>>
>> Which seems reasonable when just looking at the error and not the detail.
>
> Yea, I don't think my commit fundamentally changed things, it just made the
> issue a bit more apparent. The biggest change is that it actually allows to
> "fix" the leaked space from SQL.
>
> To actually fix this problem we'd need to move the dropping of database files
> to happen as part of transaction commit, similar to how relation files aren't
> actually truncated the moment you truncate a table. That's not a trivial
> change...
Understatement.
> WRT your repro:
>> test/pg/bin/psql mytest -t -c 'select now()'
>> 2025-02-05 19:18:26.245167+00
>>
>> test/pg/bin/dropdb mytest
>> test/pg/bin/psql mytest
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL: database "mytest" does not exist
>> ...
>> echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >>
>> test/data/postgresql.auto.conf
>> ...
>> test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts'
>> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
>> FATAL: cannot connect to invalid database "mytest"
>> HINT: Use DROP DATABASE to drop invalid databases.
>
> I'd guess this is due to using the default recovery_target_inclusive=true.
I got the same result with recovery_target_inclusive=false. I think you
could get it to work with a more precise timestamp or better, xid or lsn.
> The reason:
>
>> However, if I insert this command before the drop database:
>>
>> test/pg/bin/psql mytest -c 'create table test (id int)'
>>
>> Then recovery works fine since it is able to stop before the the drop
>> database. This is an issue on HEAD as well as all current back branches.
>
> works, presumably is that now there's another transaction to stop at
> *after*. Without the additional transaction the first complete transaction
> with the recovery_target_time
Given I'm grabbing the timestamp before the create table I'd expect it
to stop before the table is created whether recovery_target_inclusive is
true or false. I just gave it a try and sure enough either way I get:
$ test/pg/bin/psql mytest -c 'select count(*) from test'
ERROR: relation "test" does not exist
> IOW, while the two-step-drop-database is an issue, even without it, your
> reproducer doesn't seem like it actually would work, even if it were a single
> step, to implement the below scenario:
The repro works almost every time for me on all six active branches. It
hasn't worked a few times, but in those cases I suspect I accidentally
skipped a step. I'm running it manually each time.
>> We have seen this several times now because a favorite test for users is to
>> drop a database and see if they can recover it with PITR. PITR in a test
>> database is always a bit dicey since little/no WAL activity means recovery
>> doesn't have much to go on RE recovery targets, but at the least the
>> database should be fully there or fully not there.
>
> Because the repro actually sets the recovery target to *after* the drop
> database.
>
> Am I misunderstanding something?
Without looking at the code, my guess is the dropping of files WAL
record does not have a transaction timestamp that recovery can use so
recovery blows by the dropping of files and stops before the entry is
removed from pg_database.
It sounds like this is more or less a known issue. If we decide not to
fix it how about I at least document it as a caveat for drop database?
Regards,
-David
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2025-02-07 14:16:39 | Re: Bug in recovery of drop database? |
Previous Message | Andres Freund | 2025-02-06 18:05:31 | Re: Bug in recovery of drop database? |