hot_standby_feedback doesn't work on busy servers in 9.3+

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: hot_standby_feedback doesn't work on busy servers in 9.3+
Date: 2014-01-15 13:01:50
Message-ID: 20140115130150.GB8653@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

In 9.3+, when the primary is busy hot_standby_feedback doesn't properly
work with symptoms like a) spurious query cancels because no feedback
messages have been sent yet, b) bloat on the primary because we don't
regularly send feedback leading to stale xmins on the primary.

The relevant code is in walreceiver.c's WalReceiverMain():

len = walrcv_receive(NAPTIME_PER_CYCLE, &buf);
if (len != 0)
{
...
/* Let the master know that we received some data. */
XLogWalRcvSendReply(false, false);

/*
* If we've written some records, flush them to disk and
* let the startup process and primary server know about
* them.
*/
XLogWalRcvFlush(false);
}
else
{
...
XLogWalRcvSendReply(requestReply, requestReply);
XLogWalRcvSendHSFeedback(false);
}

So, when the connection always has data, we'll not send feedback. That's
pretty easy to demonstrate by running pgbench -jc 8 or so against the
primary.

Looking into this I also noticed that the busy path is odd, because a)
why are we sending a reply before flushing things to disk? b)
XLogWalRcvFlush() will do it's own XLogWalRcvSendReply().

To a good part that seems to have been introduced in
6f60fdd7015b032bf49273c99f80913d57eac284.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Day, David 2014-01-15 14:22:33 pg_restore - table restoration options - odd behaivors.
Previous Message ludovic.pollet 2014-01-15 12:49:35 BUG #8842: lo_open/fastpath transaction inconsistency