From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | "'Florian Pflug'" <fgp(at)phlo(dot)org>, "'Fujii Masao'" <masao(dot)fujii(at)gmail(dot)com> |
Cc: | "'Andres Freund'" <andres(at)2ndquadrant(dot)com>, "'Hannu Krosing'" <hannu(at)2ndquadrant(dot)com>, "'Sameer Thakur'" <samthakur74(at)gmail(dot)com>, "'Ants Aasma'" <ants(at)cybertec(dot)at>, <sthomas(at)optionshouse(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Samrat Revagade'" <revagade(dot)samrat(at)gmail(dot)com>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Inconsistent DB data in Streaming Replication |
Date: | 2013-04-17 10:22:59 |
Message-ID: | 004d01ce3b55$8ab72380$a0256a80$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Monday, April 15, 2013 1:02 PM Florian Pflug wrote:
> On Apr14, 2013, at 17:56 , Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > At fast shutdown, after walsender sends the checkpoint record and
> > closes the replication connection, walreceiver can detect the close
> > of connection before receiving all WAL records. This means that,
> > even if walsender sends all WAL records, walreceiver cannot always
> > receive all of them.
>
> That sounds like a bug in walreceiver to me.
>
> The following code in walreceiver's main loop looks suspicious:
>
> /*
> * Process the received data, and any subsequent data we
> * can read without blocking.
> */
> for (;;)
> {
> if (len > 0)
> {
> /* Something was received from master, so reset timeout */
> ...
> XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1);
> }
> else if (len == 0)
> break;
> else if (len < 0)
> {
> ereport(LOG,
> (errmsg("replication terminated by primary server"),
> errdetail("End of WAL reached on timeline %u at %X/%X",
> startpointTLI,
> (uint32) (LogstreamResult.Write >> 32),
> (uint32) LogstreamResult.Write)));
> ...
> }
> len = walrcv_receive(0, &buf);
> }
>
> /* Let the master know that we received some data. */
> XLogWalRcvSendReply(false, false);
>
> /*
> * If we've written some records, flush them to disk and
> * let the startup process and primary server know about
> * them.
> */
> XLogWalRcvFlush(false);
>
> The loop at the top looks fine - it specifically avoids throwing
> an error on EOF. But the code then proceeds to XLogWalRcvSendReply()
> which doesn't seem to have the same smarts - it simply does
>
> if (PQputCopyData(streamConn, buffer, nbytes) <= 0 ||
> PQflush(streamConn))
> ereport(ERROR,
> (errmsg("could not send data to WAL stream: %s",
> PQerrorMessage(streamConn))));
>
> Unless I'm missing something, that certainly seems to explain
> how a standby can lag behind even after a controlled shutdown of
> the master.
Do you mean to say that as an error has occurred, so it would not be able to
flush received WAL, which could result in loss of WAL?
I think even if error occurs, it will call flush in WalRcvDie(), before
terminating WALReceiver.
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2013-04-17 10:49:10 | Re: Inconsistent DB data in Streaming Replication |
Previous Message | Dimitri Fontaine | 2013-04-17 09:41:32 | Re: event trigger API documentation? |