Re: Various bugs if segment containing redo pointer does not exist

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Various bugs if segment containing redo pointer does not exist
Date: 2023-10-24 00:43:52
Message-ID: 20231024004352.penbryfk3fvisocw@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-10-23 16:21:45 -0700, Andres Freund wrote:
> 1) For some reason I haven't yet debugged, the ReadRecord(PANIC) in
> PerformWalRecovery() doesn't PANIC, but instead just returns NULL
>
> We *do* output a DEBUG message, but well, that's insufficient.

The debug is from this backtrace:

#0 XLogFileReadAnyTLI (segno=6476, emode=13, source=XLOG_FROM_PG_WAL) at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:4291
#1 0x000055d7b3949db0 in WaitForWALToBecomeAvailable (RecPtr=108649259008, randAccess=true, fetching_ckpt=false, tliRecPtr=108665421680, replayTLI=1,
replayLSN=108665421680, nonblocking=false) at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3697
#2 0x000055d7b39494ff in XLogPageRead (xlogreader=0x55d7b472c470, targetPagePtr=108649250816, reqLen=8192, targetRecPtr=108665421680,
readBuf=0x55d7b47ba5d8 "\024\321\005") at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3278
#3 0x000055d7b3941bb1 in ReadPageInternal (state=0x55d7b472c470, pageptr=108665413632, reqLen=8072)
at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:1014
#4 0x000055d7b3940f43 in XLogDecodeNextRecord (state=0x55d7b472c470, nonblocking=false)
at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:571
#5 0x000055d7b3941a41 in XLogReadAhead (state=0x55d7b472c470, nonblocking=false) at /home/andres/src/postgresql/src/backend/access/transam/xlogreader.c:947
#6 0x000055d7b393f5fa in XLogPrefetcherNextBlock (pgsr_private=94384934340072, lsn=0x55d7b47cfeb8)
at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:496
#7 0x000055d7b393efcd in lrq_prefetch (lrq=0x55d7b47cfe88) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:256
#8 0x000055d7b393f190 in lrq_complete_lsn (lrq=0x55d7b47cfe88, lsn=0) at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:294
#9 0x000055d7b39401ba in XLogPrefetcherReadRecord (prefetcher=0x55d7b47bc5e8, errmsg=0x7ffc23505920)
at /home/andres/src/postgresql/src/backend/access/transam/xlogprefetcher.c:1041
#10 0x000055d7b3948ff8 in ReadRecord (xlogprefetcher=0x55d7b47bc5e8, emode=23, fetching_ckpt=false, replayTLI=1)
at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:3078
#11 0x000055d7b3946749 in PerformWalRecovery () at /home/andres/src/postgresql/src/backend/access/transam/xlogrecovery.c:1640

The source of the emode=13=DEBUG2 is that that's hardcoded in
WaitForWALToBecomeAvailable(). I guess the error ought to come from
XLogPageRead(), but all that happens is this:

case XLREAD_FAIL:
if (readFile >= 0)
close(readFile);
readFile = -1;
readLen = 0;
readSource = XLOG_FROM_ANY;
return XLREAD_FAIL;

which *does* error out for some other failures:
errno = save_errno;
ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen),
(errcode_for_file_access(),
errmsg("could not read from WAL segment %s, LSN %X/%X, offset %u: %m",
fname, LSN_FORMAT_ARGS(targetPagePtr),
readOff)));

but not for a file that couldn't be opened. Which wouldn't have to be due to
ENOENT, could also be EACCESS...

xlogreader has undergone a fair bit of changes in the last few releases. As of
now, I can't really understand who is responsible for reporting what kind of
error.

/*
* Data input callback
*
* This callback shall read at least reqLen valid bytes of the xlog page
* starting at targetPagePtr, and store them in readBuf. The callback
* shall return the number of bytes read (never more than XLOG_BLCKSZ), or
* -1 on failure. The callback shall sleep, if necessary, to wait for the
* requested bytes to become available. The callback will not be invoked
* again for the same page unless more than the returned number of bytes
* are needed.
*
* targetRecPtr is the position of the WAL record we're reading. Usually
* it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
* to read and verify the page or segment header, before it reads the
* actual WAL record it's interested in. In that case, targetRecPtr can
* be used to determine which timeline to read the page from.
*
* The callback shall set ->seg.ws_tli to the TLI of the file the page was
* read from.
*/
XLogPageReadCB page_read;

/*
* Callback to open the specified WAL segment for reading. ->seg.ws_file
* shall be set to the file descriptor of the opened segment. In case of
* failure, an error shall be raised by the callback and it shall not
* return.
*
* "nextSegNo" is the number of the segment to be opened.
*
* "tli_p" is an input/output argument. WALRead() uses it to pass the
* timeline in which the new segment should be found, but the callback can
* use it to return the TLI that it actually opened.
*/
WALSegmentOpenCB segment_open;

My reading of this is that page_read isn't ever supposed to error out - yet we
do it all over (e.g. WALDumpReadPage(), XLogPageRead()) and that segment_open
is supposed to error out - but we don't even use segment_open for
xlogrecovery.c. Who is supposed to error out then? And why is it ok for the
read callbacks to error out directly, if they're supposed to return -1 in case
of failure?

Somewhat of an aside: It also seems "The callback shall sleep, if necessary,
to wait for the requested bytes to become available." is outdated, given that
we *explicitly* don't do so in some cases and support that via
XLREAD_WOULDBLOCK?

I dug through recent changes, expecting to find the problem. But uh, no. I
reproduced this in 9.4, and I think the behaviour might have been introduced
in 9.3 (didn't have a build of that around, hence didn't test that), as part
of:

commit abf5c5c9a4f
Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Date: 2013-02-22 11:43:04 +0200

If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.

Before that, XLogPageRead() called XLogFileReadAnyTLI() with the "incoming"
emode, when in crash recovery -> PANIC in this case. After that commit, the
call to XLogFileReadAnyTLI() was moved to WaitForWALToBecomeAvailable(), but
the emode was changed to unconditionally be DEBUG2.

/*
* Try to restore the file from archive, or read an existing
* file from pg_xlog.
*/
readFile = XLogFileReadAnyTLI(readSegNo, DEBUG2,
currentSource == XLOG_FROM_ARCHIVE ? XLOG_FROM_ANY :
currentSource);

In 9.4 the server actually came up ok after encountering the problem
(destroying data in the process!), not sure where we started to bogusly
initialize shared memory...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-24 00:44:01 Re: Show version of OpenSSL in ./configure output
Previous Message Erik Wienhold 2023-10-24 00:20:26 Re: Patch: Improve Boolean Predicate JSON Path Docs