From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Synch Rep for CommitFest 2009-07 |
Date: | 2009-07-17 09:15:11 |
Message-ID: | 3f0b79eb0907170215n5765442bw4ea99b031199ba5b@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Jul 16, 2009 at 6:00 PM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> The archive should not normally contain partial XLOG files, only if you
> manually copy one there after primary has crashed. So I don't think
> that's something we need to support.
You are right. And, if the last valid record exists in the middle of
the restored
file (e.g. by XLOG_SWITCH record), <begin> should indicate the head of the
next file.
> Hmm. You only need the timeline history file if the base backup was
> taken in an earlier timeline. That situation would only arise if you
> (manually) take a base backup, restore to a server (which creates a new
> timeline), and then create a slave against that server. At least in the
> 1st phase, I think we can assume that the standby has access to the same
> archive, and will find the history file from there. If not, throw an
> error. We can add more bells and whistles later.
Okey, I hold the problem about a history file for possible later consideration.
> As the patch stands, new walsender connections are refused when one is
> active already. What if the walsender connection is in a zombie state?
> For example, it's trying to send WAL to the slave, but the network
> connection is down, and the packets are going to a black hole. It will
> take a while for the TCP layer to declare the connection dead, and close
> the socket. During that time, you can't connect a new slave to the
> master, or the same slave using a better network connection.
>
> The most robust way to fix that is to support multiple walsenders. The
> zombie walsender can take its time to die, while the new walsender
> serves the new connection. You could tweak SO_TIMEOUTs and stuff, but
> even then the standby process could be in some weird hung state.
>
> And of course, when we get around to add support for multiple slaves,
> we'll have to do that anyway. Better get it right to begin with.
Thanks for the detailed description! I was thinking that a new GUC
replication_timeout and some keepalive parameters would be enough to
help with such trouble. But I agree that the support multiple walsenders
is better solution, so I'll try this problem.
> Even in synchronous replication, a backend should only have to wait when
> it commits. You would only see the difference with very large
> transactions that write more WAL than fits in wal_buffers, though, like
> data loading.
That's right.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2009-07-17 09:37:01 | Re: Synch Rep for CommitFest 2009-07 |
Previous Message | Nikhil Sontakke | 2009-07-17 08:55:24 | Re: [PATCH] DefaultACLs |