From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Replication server timeout patch |
Date: | 2011-03-16 07:49:29 |
Message-ID: | AANLkTik3-GETvakKDTwNXC3OVUr+w3DFMiriG2aiTguy@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> I think we should consider making this change for 9.1. This is a real
>>> wart, and it's going to become even more of a problem with sync rep, I
>>> think.
>>
>> Yeah, that's a welcome! Please feel free to review the patch.
>
> I discussed this with Heikki on IM.
>
> I think we should rip all the GUC change stuff out of this patch and
> just decree that if you set a timeout, you get a timeout. If you set
> this inconsistently with wal_receiver_status_interval, then you'll get
> lots of disconnects. But that's your problem. This may seem a little
> unfriendly, but the logic in here is quite complex and still isn't
> going to really provide that much protection against bad
> configurations. The only realistic alternative I see is to define
> replication_timeout as a multiple of the client's
> wal_receiver_status_interval, but that seems quite annoyingly
> unfriendly. A single replication_timeout that applies to all slaves
> doesn't cover every configuration someone might want, but it's simple
> and easy to understand and should cover 95% of cases. If we find that
> it's really necessary to be able to customize it further, then we
> might go the route of adding the much-discussed standby registration
> stuff, where there's a separate config file or system table where you
> can stipulate that when a walsender with application_name=foo
> connects, you want it to get wal_receiver_status_interval=$FOO. But I
> think that complexity can certainly wait until 9.2 or later.
>
> I also think that the default for replication_timeout should not be 0.
> Something like 60s seems about right. That way, if you just use the
> default settings, you'll get pretty sane behavior - a connectivity
> hiccup that lasts more than a minute will bounce the client. We've
> already gotten reports of people who thought they were replicating
> when they really weren't, and had to fiddle with settings and struggle
> to try to make it robust. This should make things a lot nicer for
> people out of the box, but it won't if it's disabled out of the box.
>
> On another note, there doesn't appear to be any need to change the
> return value of WaitLatchOrSocket().
Agreed. I'll change the patch.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2011-03-16 08:41:12 | Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause, |
Previous Message | Fujii Masao | 2011-03-16 07:36:28 | Re: How should the waiting backends behave in sync rep? |