Re: Change pg_last_xlog_receive_location not to move backwards

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change pg_last_xlog_receive_location not to move backwards
Date: 2011-02-13 21:05:07
Message-ID: AANLkTink-e8V5NsMGvGP2T8088WYQz1D3=65hd8U3rrS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 30, 2011 at 11:12 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jan 30, 2011 at 10:44 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> I do not understand what doing so gets us.
>>
>> Say we previously received 2/3 of a WAL file, and replayed most of it.
>> So now the shared buffers have data that has been synced to that WAL
>> file already, and some of those dirty shared buffers have been written
>> to disk and some have not.  At this point, we need the data in the first
>> 2/3 of the WAL file in order to reach a consistent state.  But now we
>> lose the connection to the master, and then we restore it.  Now we
>> request the entire file from the start rather than from where it
>> left off.
>>
>> Either of two things happens.  Most likely, the newly received WAL file
>> matches the file it is overwriting, in which case there was no
>> point in asking for it.
>>
>> Less likely, the master is feeding us gibberish.  By requesting the
>> full WAL file, we check the header and detect that the master is feeding
>> us gibberish.  Unfortunately, we only detect that fact *after* we have
>> replaced a critical part of our own (previously good) copy of the WAL
>> file with said gibberish.  The standby is now in an unrecoverable state.
>
> Right. To avoid this problem completely, IMO, walreceiver should validate
> the received WAL data before writing it. Or, walreceiver should write the
> WAL to the transient file, and the startup process should rename it to the
> correct name after replaying it.
>
> We should do something like the above?

I don't think we should overwrite local WAL that has already been
applied, unless we already know for sure that it is bad (and suspect
re-streaming might make it right.)

I think the better approach would be to check the existence, and the
segment header, of the WAL segment we already (think we) have in
pg_xlog. And then request re-streaming of that file from the
beginning only if it is either missing or has the wrong header in the
local copy. That might make the code a lot more complex, though. At
least I don't see a simple way to implement it.

>
>> With a bit of malicious engineering, I have created this situation.
>> I don't know how likely it is that something like that could happen
>> accidentally, say with a corrupted file system.  I have been unable
>> to engineer a situation where checking the header actually does
>> any good.  It has either done nothing, or done harm.
>
> OK, I seem to have to consider again why the code which retreats the
> replication starting location exists.
>
> At first, I added the code to prevent a half-baked WAL file. For example,
> please imagine the case where you start the standby server with no WAL
> files in pg_xlog. In this case, if replication starts from the middle of WAL
> file, the received WAL file is obviously broken (i.e., with no WAL data in
> the first half of file). This broken WAL file might cause trouble when we
> restart the standby and even when we just promote it (because the last
> applied WAL file is re-fetched at the end of recovery).
>
> OTOH, we can start replication from the middle of WAL file if we can
> *ensure* that the first half of WAL file already exists. At least, when the
> standby reconnects to the master, we might be able to ensure that and
> start from the middle.
>
>> OK, thanks for the explanation.  Is there a race condition here?  That is,
>> it seems that with your patch this part of the code could get executed
>> after streaming is restarted, but before the streaming ever actually received
>> and wrote anything.  So it would be checking the old header, not the newly
>> about-to-be received header.
>
> As far as I read the code again, there is no such a race condition. When
> the received WAL is read just after streaming is restarted, that part of the
> code seems to always be executed.
>
> Maybe I'm missing something. Could you explain about the problematic
> scenario?

OK, I think you are right. I was thinking that if apply is well
behind receive when the connection is lost, that a new connection
could immediately pass the "havedata" test an so proceed to the header
check perhaps before anything was received. But now I see that we
never try to re-connect until after apply has caught up with receive,
so that race condition cannot happen. Sorry for the false alarm.

By the way, the patch no longer applies cleanly to HEAD, as of commit
d309acf201ab2c5 (Feb 11th). But it looks like a trivial conflict (Two
different patches both correct the same doc typo).

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-02-13 21:06:32 Re: Extensions vs PGXS' MODULE_PATHNAME handling
Previous Message Tom Lane 2011-02-13 20:52:48 Re: Extensions vs PGXS' MODULE_PATHNAME handling