Quick Links

Re: Understanding streaming replication

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Pawel Veselov EXTERN" <pawel(dot)veselov(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Understanding streaming replication
Date:	2012-11-12 09:36:54
Message-ID:	D960CB61B694CF459DCFB4B0128514C208AF03E0@exadv11.host.magwien.gv.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I'll try to answer the questions I can.

Pawel Veselov wrote:
> I've been struggling with understanding all the necessary pieces for streaming replication. So I put
> down the pieces as I did understand them, and would appreciate if you guys could point out any of the
> stuff I understood or have done wrong.
>
> The set up is pgpool + streaming replication + hot stand by. No load balancing, stand-by nodes will
> not receive any application queries (I don't have that big of a query load, and I don't want to risk
> inconsistent reads). There are no shared file systems, but there is a way to rsync/scp files between
> nodes. Fail-over is automatic, and should kick in within reasonably small period after master failure.
>
> 1. Archiving. Should be turned on on all the nodes. The archive command should copy the archive file
> to the local archive directory, and rsync archive directory between all the nodes. My understanding is
> that archiving is necessary if a stand-by node ever "missed" enough WAL updates to need an old enough
> WAL that might have been removed from pg_xlog.

You don't give details about how the rsync is triggered,
but I'd advise against having rsync as part of archive_command.
First, it is slow and if there is a lot of activity, the
archiver will not be able to keep up.
Second, if rsync fails, the WAL file will not be considered
archived.

Both these things will keep the WAL files from being deleted
from pg_xlog.

I'd schedule rsync as a cron job or similar.

> QUESTION: After the failover, the new master will start archiving its WAL files. These archived WALs
> will not collide in any way with the archived WALs generated by previous master(s)?

They will not, because the standby starts a new "time line"
when it is promoted to primary, which will result in new
WAL file names.

> QUESTION: What is a good policy for archive clean up? From the perspective to only remove archive
> files that are guaranteed to never be required by any nodes.

You cannot tell from the primary's side.
Since you also need the archives to restore an online backup,
I'd keep them a long as your backup policy dictates.
I hope you don't rely on standby databases for backup (just
imagine an accidental DROP TABLE that gets propagated to all
standbys withing seconds).

> 2. Failover. On master failure, pgpool will automatically select a new master, and degenerate all
> other nodes. The cluster is now in the emergency state and requires manual intervention for
> reconfiguration and recovery. pgpool executes a script to promote a node, that script will create a
> trigger file on a newly selected master node, and postgres will exist stand-by mode.
>
> QUESTION: If multiple pgpools are running, and if there are no network problems, and configuration
> files are identical, is there any guarantee that the same stand-by node will be selected for
> promotion? Concern here is that with configuration of (M-SB0-SB1) one pgpool decides to promote SB0
> and another - SB1, causing both of them to enter master mode, and splitting the cluster. It does look
> that pgpool will always select next "alive" node for promotion, but I couldn't find a definitive
> statement on that.

I don't know about pgpool and its abilities to handle
cluster failover, but I wouldn't go this way at all.
Even if the answer were that in the circumstances you
describe things would work, you can depend on it that
things will go wrong in ways different from what you
expect, e.g. a broken network card.
The consequences would be worse than I'd like to imagine.

If you want reliable automatic failover, consider cluster
software.

> 3. Recovery. That part is a bit confusing. The majority of the documentation says that in this case,
> the node should be re-loaded from the base backup, obtained from the master. I'm not sure why this is
> necessary, if there are enough archived WALs.

Because of the new time line; streaming replication cannot
(yet) recover across a time line change.

> QUESTION: Is there any metric to understand whether hauling base will be slower/faster than replaying
> missed WALs? Anyway, pgpool only has one recovery mechanism, and it does invoke a base restore from
> whatever current master is.
>
> PROBLEM: This I see as a problem. The only way that I see to re-attach a node to the pgpool, short of
> restarting it, is to call pcp_recovery_node. This will make the master take a base back up, push it to
> the stand-by that needs recovery, and re-start the stand-by node. I am not sure if there is a good way
> to check if that node has already been recovered. That because if there are more than 2 pgpools, they
> both will attempt to recover the same stand-by, and this will probably get ugly.

Yours,
Laurenz Albe

In response to

Understanding streaming replication at 2012-11-10 01:30:48 from Pawel Veselov

Responses

Re: Understanding streaming replication at 2012-11-12 16:39:36 from Philippe Amelant
Re: Understanding streaming replication at 2012-11-12 18:11:11 from Pawel Veselov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ivan Voras	2012-11-12 10:03:14	PostgreSQL and a clustered file system
Previous Message	Albe Laurenz	2012-11-12 08:55:25	Re: explain plan visibility