From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Steele <david(at)pgbackrest(dot)org> |
Cc: | "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Bug in recovery of drop database? |
Date: | 2025-02-06 18:05:31 |
Message-ID: | cv3jqr2n6o4jgp6f6mvscecno3i4ps5hgbs57h72djw47vjzu7@uvmkcshcspyv |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2025-02-06 15:55:21 +0000, David Steele wrote:
> On 2/5/25 15:24, David Steele wrote:
> >
> > psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
> > FATAL: cannot connect to invalid database "mytest"
> > HINT: Use DROP DATABASE to drop invalid databases.
>
> Since this error message started cropping up fairly recently I had a look at
> related commits and found this:
>
> c66a7d75 Handle DROP DATABASE getting interrupted
>
> This commit changed the error message so it is more obvious that something
> has gone wrong, but the underlying issue appears the same for recovery.
>
> Before c66a7d75 it looked like this:
>
> FATAL: database "mytest" does not exist
> DETAIL: The database subdirectory "base/16384" is missing.
>
> Which seems reasonable when just looking at the error and not the detail.
Yea, I don't think my commit fundamentally changed things, it just made the
issue a bit more apparent. The biggest change is that it actually allows to
"fix" the leaked space from SQL.
To actually fix this problem we'd need to move the dropping of database files
to happen as part of transaction commit, similar to how relation files aren't
actually truncated the moment you truncate a table. That's not a trivial
change...
WRT your repro:
> test/pg/bin/psql mytest -t -c 'select now()'
> 2025-02-05 19:18:26.245167+00
>
> test/pg/bin/dropdb mytest
> test/pg/bin/psql mytest
> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
> FATAL: database "mytest" does not exist
> ...
> echo "recovery_target_time = '2025-02-05 19:18:26.245167+00'" >>
> test/data/postgresql.auto.conf
> ...
> test/pg/bin/psql mytest -c 'select count(*) from pgbench_accounts'
> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed:
> FATAL: cannot connect to invalid database "mytest"
> HINT: Use DROP DATABASE to drop invalid databases.
I'd guess this is due to using the default recovery_target_inclusive=true.
The reason:
> However, if I insert this command before the drop database:
>
> test/pg/bin/psql mytest -c 'create table test (id int)'
>
> Then recovery works fine since it is able to stop before the the drop
> database. This is an issue on HEAD as well as all current back branches.
works, presumably is that now there's another transaction to stop at
*after*. Without the additional transaction the first complete transaction
with the recovery_target_time
IOW, while the two-step-drop-database is an issue, even without it, your
reproducer doesn't seem like it actually would work, even if it were a single
step, to implement the below scenario:
> We have seen this several times now because a favorite test for users is to
> drop a database and see if they can recover it with PITR. PITR in a test
> database is always a bit dicey since little/no WAL activity means recovery
> doesn't have much to go on RE recovery targets, but at the least the
> database should be fully there or fully not there.
Because the repro actually sets the recovery target to *after* the drop
database.
Am I misunderstanding something?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2025-02-06 19:28:50 | Re: Bug in recovery of drop database? |
Previous Message | David Steele | 2025-02-06 15:55:21 | Re: Bug in recovery of drop database? |