Fail hard if xlogreader.c fails on out-of-memory

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Fail hard if xlogreader.c fails on out-of-memory
Date: 2023-09-26 07:38:28
Message-ID: ZRKKdI5-RRlta3aF@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,
(Thomas in CC.)

Now that becfbdd6c1c9 has improved the situation to detect the
difference between out-of-memory and invalid WAL data in WAL, I guess
that it is time to tackle the problem of what we should do when
reading WAL records bit fail on out-of-memory.

To summarize, currently the WAL reader APIs fail the same way if we
detect some incorrect WAL record or if a memory allocation fails: an
error is generated and returned back to the caller to consume. For
WAL replay, not being able to make the difference between an OOM and
the end-of-wal is a problem in some cases. For example, in crash
recovery, failing an internal allocation will be detected as the
end-of-wal, causing recovery to stop prematurely. In the worst cases,
this silently corrupts clusters because not all the records generated
in the local pg_wal/ have been replayed. Oops.

When in standby mode, things are a bit better, because we'd just loop
and wait for the next record. But, even in this case, if the startup
process does a crash recovery while standby is set up, we may finish
by attempting recovery from a different source than the local pg_wal/.
Not strictly critical, but less optimal in some cases as we could
switch to archive recovery earlier than necessary.

In a different thread, I have proposed to extend the WAL reader
facility so as an error code is returned to make the difference
between an OOM or the end of WAL with an incorrect record:
https://www.postgresql.org/message-id/ZRJ-p1dLUY0uoChc%40paquier.xyz

However this requires some ABI changes, so that's not backpatchable.

This leaves out what we can do for the existing back-branches, and
one option is to do the simplest thing I can think of: if an
allocation fails, just fail *hard*. The allocations of the WAL reader
rely on palloc_extended(), so I'd like to suggest that we switch to
palloc() instead. If we do so, an ERROR is promoted to a FATAL during
WAL replay, which makes sure that we will never stop recovery earlier
than we should, FATAL-ing before things go wrong.

Note that the WAL prefetching already relies on a palloc() on HEAD in
XLogReadRecordAlloc(), which would fail hard the same way on OOM.

So, attached is a proposal of patch to do something down to 12.

Thoughts and/or comments are welcome.
--
Michael

Attachment Content-Type Size
wal-oom-fail.patch text/x-diff 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Étienne BERSAC 2023-09-26 07:56:27 Re: logfmt and application_context
Previous Message Peter Smith 2023-09-26 07:32:58 Re: Invalidate the subscription worker in cases where a user loses their superuser status