Re: Exceptional md.c paths for recovery and zero_damaged_pages

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Date: 2024-12-14 03:05:03
Message-ID: kzmowqwplg5jwl5gu5b6u6fj7glo2xyqfsue4h4xtww2xfgn3u@xufsmkqi7fjm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-13 19:06:16 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > We have a fair number of special paths in md.c that are specific to
> > recovery. E.g. in mdreadv() we do:
> > ...
> > As far as I can tell, nearly all - including the above - InRecovery paths in
> > md.c are basically unreachable. And have been for quite a while.
>
> > XLogReadBufferExtended() takes care to
> > a) Create the fork if it doesn't yet exist.
> > b) Not to read from beyond EOF. If EOF is found, we extend the relation to be
> > large enough.
> > Which afaict should suffice to prevent needing the above?
>
> I haven't checked the git history, but I suspect this logic is later
> than the md.c code you mention, and may well render it obsolete.

Oddly enough, not really - the logic originated in the original REDO/UNDO
commit, in 2000 (b58c0411bad4).

> > The InRecovery paths for _mdfd_getseg seem to originate in 2004's 303e46ea932
> > and the zero-beyond-eof seems to be from 2007's ef07221997e - although it was
> > just *restricted* to InRecovery in that commit.
>
> We definitely needed 303e46ea932 at the time, but that doesn't mean
> we still do.

I'd assume it was needed at the time, I just can't quite figure out by what.

We used to have more code using the fake relcache stuff, which indicates that
those places weren't going through extend-the-file logic in
XLogReadBufferExtended(). E.g. the "incomplete split" logic that Heikki
removed from a bunch of indexes.

Another theory is that it could have been related to wal_level=minimal. Before
Noah's redesign in c6b92041d38 there were rather weird mixes of logged and
unlogged modifications to the same relfilenode. I don't quite see how it could
have required this behaviour in mdreadv() though.

> Possibly ef07221997e was just copying the earlier logic.

It did note this:
Also, remove the kluge of allowing mdread()
to return zeroes from a read-beyond-EOF: this is now an error condition
except when InRecovery or zero_damaged_pages = true. (Hash indexes used to
require that behavior, but no more.)

which is about this hunk:

/*
- * If we are at or past EOF, return zeroes without complaining. Also
- * substitute zeroes if we found a partial block at EOF.
- *
- * XXX this is really ugly, bad design. However the current
- * implementation of hash indexes requires it, because hash index
- * pages are initialized out-of-order.
+ * Short read: we are at or past EOF, or we read a partial block at
+ * EOF. Normally this is an error; upper levels should never try to
+ * read a nonexistent block. However, if zero_damaged_pages is ON
+ * or we are InRecovery, we should instead return zeroes without
+ * complaining. This allows, for example, the case of trying to
+ * update a block that was later truncated away.
*/

Before this there was no mention of recovery, so perhaps the InRecovery bit
was just trying to not break recovery (there weren't any tests for that at the
time, I think, so that'd be quite resonable).

I'm wondering if we should just put an error into the relevant paths in HEAD
and see whether it triggers for anybody in the next months. Having all these
untested paths in md.c forever doesn't seem great.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-12-14 03:15:05 Re: Fix early elog(FATAL)
Previous Message John Naylor 2024-12-14 03:00:03 Re: typo in a comment of restrictinfo.c