From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: walsender doesn't send keepalives when writes are pending |
Date: | 2014-02-25 15:54:47 |
Message-ID: | 20140225155447.GQ6718@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-02-25 10:45:55 -0500, Robert Haas wrote:
> On Fri, Feb 14, 2014 at 7:05 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > In WalSndLoop() we do:
> >
> > wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT |
> > WL_SOCKET_READABLE;
> >
> > if (pq_is_send_pending())
> > wakeEvents |= WL_SOCKET_WRITEABLE;
> > else if (wal_sender_timeout > 0 && !ping_sent)
> > {
> > ...
> > if (GetCurrentTimestamp() >= timeout)
> > WalSndKeepalive(true);
> > ...
> >
> > I think those two if's should simply be separate. There's no reason not
> > to ask for a ping when we're writing. On a busy server that might be the
> > case most of the time.
> > The if (pq_is_send_pending()) should also be *after* sending the
> > keepalive, after all, it might not immediately have been flushed as
> > unlikely as that is.ackers
>
> I spent an inordinate amount of time looking at this patch. I'm not
> sure your proposed change will accomplish the desired effect. The
> thing is that the code you quote is within an if-block gated by
> (caughtup && !streamingDoneSending) || pq_is_send_pending().
> Currently, therefore, the behavior is that we wait on the latch
> *either when* we're caught up (and thus need to wait for more WAL) or
> when the outbound queue is full (and thus we need to wait for it to
> drain), but we send a keep-alive only in the former case (namely,
> we're caught up).
>
> Your proposed change would cause us to send keep-alives when we're
> caught up, or when we're not caught up but the write queue happens to
> be full. That doesn't make much sense. There might be a reason to
> start sending keep-alives when we're not caught up, but even if we
> want that behavior change there's no reason to do it only when the
> write queue is full.
I am not sure I can follow. Why doesn't it make sense to send out the
keepalive (with replyRequested = true) when we're busy sending stuff
(which will be the case most of the time on a busy server)?
The point is that clients like pg_receivexlog and pg_basebackup don't
send an keepalive response all the time they receive something (which is
perfectly fine), but just when their internal timer says it's time to do
so, or if the walsender asks them to do so. So, if the server has a
*lower* timeout than configured for pg_receivexlog and the server is a
busy one, you'll get timeouts relatively often. This is a pretty
frequent situation in synchronous replication setups because the default
timeout is *way* too high for that.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-02-25 16:01:30 | Re: [PATCH] Use MAP_HUGETLB where supported (v3) |
Previous Message | Robert Haas | 2014-02-25 15:53:44 | Re: [PATCH] Use MAP_HUGETLB where supported (v3) |