Quick Links

Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Date:	2010-12-20 14:35:57
Message-ID:	AANLkTikEj7BRe3_O7Gie4+Um4tZ0mjMks_5nxwJC_wzb@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Dec 20, 2010 at 3:17 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Yeah. If we rely on the TCP send buffer filling up, then the amount
>> of time the master takes to notice a dead standby is going to be hard
>> for the user to predict. I think the standby ought to send some sort
>> of heartbeat and the master should declare the standby dead if it
>> doesn't see a heartbeat soon enough. Maybe the heartbeat could even
>> include the receive/fsync/replay LSNs, so that sync rep can use the
>> same machinery but with more aggressive policies about when they must
>> be sent.
>
> OK. How about keepalive-like parameters and behaviors?
>
> replication_keepalives_idle
> replication_keepalives_interval
> replication_keepalives_count
>
> The master sends the keepalive packet if replication_keepalives_idle
> elapsed after receiving the last ACK packet including the receive/
> fsync/replay LSNs from the standby. OTOH, the standby sends the
> ACK packet back to the master as soon as receiving the keepalive
> packet.
>
> If the master could not receive the ACK packet for
> replication_keepalives_interval, it repeats sending the keepalive
> packet and receiving the ACK replication_keepalives_count -1
> times. If no ACK packet has finally arrived, the master thinks the
> standby has been dead.

This doesn't really make sense, because you're connecting over a TCP
connection. Once you send the first keepalive, TCP will keep retrying
in some way that we have no control over. If those packets aren't
getting through, adding more data to what has to be transmitted seems
unlikely to do anything useful. I think the parameters we can
usefully set are:

- how long does the master wait before sending a keepalive request?
- how long does the master wait after sending a keepalive before
declaring the slave dead and closing the connection?

But this can be further simplified. The slave doesn't really need the
master to prompt it to send acknowledgments. It only needs to send
them sufficiently often. As part of the start-replication sequence,
let's have the master tell the slave "send me an acknowledgment at
least every N seconds". And then the slave must do that. The master
then has some value K > N, such that if no acknowledgment is received
after K seconds, the connection is disconnected.

The only reason to have the master send explicit keepalive requests
(vs. just telling the client the interval) is if the master might
request them for some reason other than timer expiration. Since the
main point of this is to detect the situation where the slave has e.g.
power cycled so that the connection is gone but the master doesn't
know it, you could imagine a system where, when a new replication
connection is received, we request keepalives on all of the existing
connections to see if any of them are defunct. But I don't really
think it needs to be quite that complicated.

Another consideration is that you could configure the
keepalive-frequency on the slave and the declare-dead-time on the
master. Then the master wouldn't need to tell the slave the
keepalive-frequency at replication start-up time. But that might also
increase the chances of incompatible settings (e.g. slave's keepalive
frequency is >= master's declare-dead-time), which would result in a
lot of unnecessary reconnects. If both parameters are configured on
the master, then we can enforce that declare-dead-time >
keepalive-frequency.

So I suggest:

replication_keepalive_time - how often the slave is instructed to send
acknowledgments when idle
replication_idle_timeout - the period of inactivity after which the
master closes the connection to the slave

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep at 2010-12-20 08:17:27 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aidan Van Dyk	2010-12-20 14:39:49	Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Previous Message	Florian Pflug	2010-12-20 14:11:53	Re: serializable lock consistency