deferred writing of two-phase state files adds fragility

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: deferred writing of two-phase state files adds fragility
Date: 2024-12-04 17:04:47
Message-ID: CA+Tgmob2e542abFO-RspquqVYzpt7X4JeOKMDVXwDEowqzmcOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Let's suppose that you execute PREPARE TRANSACTION and, before the
next CHECKPOINT, the WAL record for the PREPARE TRANSACTION gets
corrupted on disk. This might seem like an unlikely scenario, and it
is, but we saw a case at EDB not too long ago.

To a first approximation, the world ends. You can't execute COMMIT
TRANSACTION or ROLLBACK TRANSACTION, so there's now way to resolve the
prepared transaction. You also can't checkpoint, because that requires
writing a twophase state file for the prepared transaction, and that's
not possible because the WAL can't be read. What you have is a mostly
working system, except that it's going to bloat over time because the
prepared transaction is going to hold back the VACUUM horizon. And you
basically have no way out of that problem, because there's no tool
that says "I understand that my database is going to be corrupted,
that's ok, just forget about that twophase transaction".

If you shut down the database, then things become truly awful. You
can't get a clean shutdown because you can't checkpoint, so you're
going to resume recovery from the last checkpoint before the problem
happened, find the corrupted WAL, and fail. As long as your database
was up, you at least had the possibility of getting all of your data
out of it by running pg_dump, as long as you can survive the amount of
time that's going to take. And, if you did do that, you wouldn't even
have corruption. But once your database has gone down, you can't get
it back up again without running pg_resetwal. Running pg_resetwal is
not very appealing here -- first because now you do have corruption
whereas before the shutdown you didn't, and second because the last
checkpoint could already be a long time in the past, depending on how
quickly you realized you have this problem.

Before 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71, things would not have
been quite so bad. Checkpoints wouldn't fail, so you might never even
realize you had a problem, or you might just need to rebuild your
standbys. If you had corruption in a different place, like the
twophase file itself, you could simply shut down cleanly, remove the
twophase file, and start back up. I'm not quite sure whether that's
equivalent to a forced abort of the twophase transaction or whether it
might leave you with some latent corruption, but I suspect the
problems you'll have will be pretty tame compared to what happens in
the scenario described above.

Just to be clear, I am not suggesting that we should revert that
commit. I'm actually not sure whether we should change anything at
all, but I'm not very comfortable with the status quo, either. It's
unavoidable that the database will sometimes end up in a bad state --
Murphy's law, entropy, or whatever you want to call it guarantees
that. But I like it a lot better when there's something that I can
reasonably do to get the database OUT of that bad state, and in this
situation nothing works -- or at least, nothing that I could think of
works. It would be nice to improve on that somehow, if anybody has a
good idea.

--
Robert Haas
EDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-04 17:06:54 Re: checksum verification code breaks backups in v16-
Previous Message Alvaro Herrera 2024-12-04 17:01:47 Re: Add pg_stat_progress_rewrite view.