From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: walsender doesn't send keepalives when writes are pending |
Date: | 2014-03-05 16:26:13 |
Message-ID: | 53175025.9040805@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/25/2014 06:41 PM, Robert Haas wrote:
> On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Usually that state will be reached very quickly because before
>> that we're writing data to the network as fast as it can be read from
>> disk.
>
> I'm unimpressed. Even if that is in practice true, making the code
> self-consistent is a goal of non-trivial value. The timing of sending
> keep-alives has no business depending on the state of the write queue,
> and right now it doesn't. Your patch would make it depend on that,
> mostly by accident AFAICS.
Yeah, the timeout should should be checked regardless of the status of
the write queue. Per the attached patch.
While looking at this, I noticed that the time we sleep between each
iteration is calculated in a funny way. It's not this patch's fault, but
moving the code made it more glaring. After the patch, it looks like this:
> TimestampTz timeout;
> long sleeptime = 10000; /* 10 s */
> ...
> /*
> * If wal_sender_timeout is active, sleep in smaller increments
> * to not go over the timeout too much. XXX: Why not just sleep
> * until the timeout has elapsed?
> */
> if (wal_sender_timeout > 0)
> sleeptime = 1 + (wal_sender_timeout / 10);
>
> /* Sleep until something happens or we time out */
> ...
> WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
> MyProcPort->sock, sleeptime);
> ...
> /*
> * Check for replication timeout. Note we ignore the corner case
> * possibility that the client replied just as we reached the
> * timeout ... he's supposed to reply *before* that.
> */
> timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
> wal_sender_timeout);
> if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
> {
> ...
> }
The logic was the same before the patch, but I added the XXX comment
above. Why do we sleep in increments of 1/10 of wal_sender_timeout?
Originally, the code calculated when the next wakeup should happen, by
adding wal_sender_timeout (or replication_timeout, as it was called back
then) to the time of the last reply. Why don't we do that?
The code seems to have just slowly devolved into that. It was changed
from sleep-until-timeout to sleep 1/10th of wal_sender_timeout by a
patch that made walsender send a keep-alive message to the client every
time it wakes up, and I guess the idea was to send a message at an
interval that's 1/10th of the timeout. But that idea is long gone by
now; first, commit da4efa13d801ccc179f1d2c24d8a60c4a2f8ede9 turned off
sending the keep-alive messages by default, and then
6f60fdd7015b032bf49273c99f80913d57eac284 turned them back on, but so
that it's only sent once when 1/2 of wal_sender_timeout has elapsed.
I think that should be changed back to sleep until the next deadline of
when something needs to happen, instead of polling. But that can be a
separate patch.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
walsender-always-send-keepalives-2.patch | text/x-diff | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2014-03-05 16:28:00 | Re: jsonb and nested hstore |
Previous Message | Merlin Moncure | 2014-03-05 16:24:27 | Re: jsonb and nested hstore |