From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Tomonari Katsumata <t(dot)katsumata1122(at)gmail(dot)com> |
Cc: | Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>, pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #8686: Standby could not restart. |
Date: | 2014-01-06 14:38:16 |
Message-ID: | 52CABFD8.8090101@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 12/23/2013 08:15 AM, Tomonari Katsumata wrote:
>> /*
>>> * Initialize shared replayEndRecPtr,
>>> lastReplayedEndRecPtr, and
>>> * recoveryLastXTime.
>>> *
>>> * This is slightly confusing if we're starting from an
>>> online
>>> * checkpoint; we've just read and replayed the
>>> checkpoint record, but
>>> * we're going to start replay from its redo pointer,
>>> which precedes
>>> * the location of the checkpoint record itself. So even
>>> though the
>>> * last record we've replayed is indeed ReadRecPtr, we
>>> haven't
>>> * replayed all the preceding records yet. That's OK for
>>> the current
>>> * use of these variables.
>>> */
>>> SpinLockAcquire(&xlogctl->info_lck);
>>> xlogctl->replayEndRecPtr = ReadRecPtr;
>>> xlogctl->lastReplayedEndRecPtr = EndRecPtr;
>>> xlogctl->recoveryLastXTime = 0;
>>> xlogctl->currentChunkStartTime = 0;
>>> xlogctl->recoveryPause = false;
>>> SpinLockRelease(&xlogctl->info_lck);
>>>
>>
>> I think we need to fix that confusion. Your patch will do it by not
>> setting EndRecPtr yet; that fixes the bug, but leaves those variables in a
>> slightly strange state; I'm not sure what EndRecPtr points to in that case
>> (0 ?), but ReadRecPtr would be set I guess.
>>
> Yes, the values were set like below.
> ReadRecPtr:1/8E7F0B0
> EndRecPtr:0/0
>
>
>
>>
>> Perhaps we should reset replayEndRecPtr and lastReplayedEndRecPtr to the
>> REDO point here, instead of ReadRecPtr/EndRecPtr.
>
> I made another patch.
> I added a ReadRecord to make sure the REDO location is present or not.
> The similar process are done when we use backup_label.
>
> Because the ReadRecord returns a record already read,
> I set ReadRecPtr of the record to EndRecPtr.
> And also I set record->xl_prev to ReadRecPtr.
> As you said, it also worked fine.
>
> I'm not sure we should do same thing when crash recovery occurs, but now I
> added the process when archive recovery is needed.
>
> Please see attached patch.
Hmm. That would still initialize lastReplayedEndRecPtr to the checkpoint
record, when you do crash recovery (or begin archive recovery from a
filesystem snapshot, where you perform crash recovery before starting to
read the archive). I'm not very comfortable with that, even though I
don't see an immediate problem with it.
I also noticed that CheckRecoveryConsistency() compares backupEndPoint
with EndRecPtr, but minRecoveryPoint with lastReplayedEndRecPtr. That's
correct as the code stands, but it's an accident waiting to happen: if
you called CheckRecoveryConsistency() after reading a record with
ReadRecord(), but before fully replaying it, it might conclude that it's
reached the backup end location one record too early. And it's
inconsistent, anyway.
I propose the attached patch. I haven't tested it, but I think it's a
slightly more robust fix.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
fix-lastReplayedEndRecPtr-1.patch | text/x-diff | 3.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-01-06 14:45:32 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |
Previous Message | Heikki Linnakangas | 2014-01-06 14:35:42 | Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages |