Quick Links

Re: streaming replication breaks horribly if master crashes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: streaming replication breaks horribly if master crashes
Date:	2010-06-17 05:57:55
Message-ID:	AANLkTinI949X-OWATDntssWPnRVO5JxkLbdSCpvDl-e6@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> The first problem I noticed is that the slave never seems to realize
>>> that the master has gone away. Every time I crashed the master, I had
>>> to kill the wal receiver process on the slave to get it to reconnect;
>>> otherwise it just sat there waiting, either forever or at least for
>>> longer than I was willing to wait.
>>
>> Yes, I've noticed this. That was the reason for forcing walreceiver to
>> shut down on a restart per prior discussion and patches. This needs to
>> be on the open items list ... possibly it'll be fixed by Simon's
>> keepalive patch? Or is it just a tcp_keeplalive issue?
>
> I think a TCP keepalive might be enough, but I have not tried to code
> or test it.

The "keepalive on libpq" patch would help.
https://commitfest.postgresql.org/action/patch_view?id=281

>>> and this just
>>> makes it more likely. After the most recent crash, the master thought
>>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>>> pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to
>>> the master, the slave then thought that
>>> pg_last_xlog_receive_location() was 1/87000000.
>>
>> So, *in this case*, detecting out-of-sequence xlogs (and PANICing) would
>> have actually prevented the slave from being corrupted.
>>
>> My question, though, is detecting out-of-sequence xlogs *enough*? Are
>> there any crash conditions on the master which would cause the master to
>> reuse the same locations for different records, for example? I don't
>> think so, but I'd like to be certain.
>
> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots. I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd (Tom suggested on another thread that
> this might be necessary, and I think it's now clear that it is 100%
> necessary).

The attached patch changes walsender so that it always sends WAL up to
LogwrtResult.Flush instead of LogwrtResult.Write.

> But I'm not sure how this will play with fsync=off - if
> we never fsync, then we can't ever really send any WAL without risking
> this failure mode. Similarly with synchronous_commit=off, I believe
> that the next checkpoint will still fsync WAL, but the lag might be
> long.

First of all, we should not restart the master after the crash in
fsync=off case. That would cause the corruption of the master database
itself.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
send_after_fsync_v1.patch	application/octet-stream	3.0 KB

In response to

Re: streaming replication breaks horribly if master crashes at 2010-06-16 20:26:16 from Robert Haas

Responses

Re: streaming replication breaks horribly if master crashes at 2010-06-17 16:43:13 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2010-06-17 06:09:48	Re: streaming replication breaks horribly if master crashes
Previous Message	KaiGai Kohei	2010-06-17 05:22:37	modular se-pgsql as proof-of-concept