Re: Failing start-up archive recovery at Standby mode in PG9.2.4

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <kyota(dot)horiguchi(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date: 2013-04-26 11:46:48
Message-ID: CADupcHWjBsozhZFZctpvxQryA=ikKL84m2th+B0wgomS3GpMBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I explain more detail about this problem.

This problem was occurred by RestartPoint create illegal WAL file in during
archive recovery. But I cannot recognize why illegal WAL file was created
in CreateRestartPoint(). My attached patch is really plain…

In problem case at XLogFileReadAnyTLI(), first check WAL file does not get
fd. Because it does not exists property WAL File in archive directory.

XLogFileReadAnyTLI()
> if (sources & XLOG_FROM_ARCHIVE)
> {
> fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_ARCHIVE, true);
> if (fd != -1)
> {
> elog(DEBUG1, "got WAL segment from archive");
> return fd;
> }
> }

Next search WAL file in pg_xlog. There are illegal WAL File in pg_xlog. And
return illegal WAL File’s fd.

XLogFileReadAnyTLI()
> if (sources & XLOG_FROM_PG_XLOG)
> {
> fd = XLogFileRead(log, seg, emode, tli, XLOG_FROM_PG_XLOG, true);
> if (fd != -1)
> return fd;
> }

Returned fd is be readFile value. Of cource readFile value is over 0. So
out of for-loop.

XLogPageRead
> readFile = XLogFileReadAnyTLI(readId, readSeg, DEBUG2,
> sources);
> switched_segment = true;
> if (readFile >= 0)
> break;

Next, problem function point. Illegal WAL file was read, and error.

XLogPageRead
> if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0)
> {
> ereport(emode_for_corrupt_record(emode, *RecPtr),
> (errcode_for_file_access(),
> errmsg("could not seek in log file %u, segment %u to offset %u: %m",
> readId, readSeg, readOff)));
> goto next_record_is_invalid;
> }
> if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
> {
> ereport(emode_for_corrupt_record(emode, *RecPtr),
> (errcode_for_file_access(),
> errmsg("could not read from log file %u, segment %u, offset %u: %m",
> readId, readSeg, readOff)));
> goto next_record_is_invalid;
> }
> if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode, false))
> goto next_record_is_invalid;

I think that horiguchi's discovery point is after this point.
We must fix that CreateRestartPoint() does not create illegal WAL File.

Best regards,

--
Mitsumasa KONDO

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-04-26 11:47:02 Re: Recovery target 'immediate'
Previous Message Bernd Helmle 2013-04-26 11:28:07 Re: pg_controldata gobbledygook