Re: initial sync of multiple streaming slaves simultaneously

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: Mike Roest <mike(dot)roest(at)replicon(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: initial sync of multiple streaming slaves simultaneously
Date: 2012-09-19 19:34:20
Message-ID: CAP=oouEg5ppbY-9svtApj2G2mJS+8aUU_sUtG1m=hPjmVTZRhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Just curious, is there a reason why you can't use pg_basebackup ?

On Wed, Sep 19, 2012 at 12:27 PM, Mike Roest <mike(dot)roest(at)replicon(dot)com> wrote:
>
>> Is there any hidden issue with this that we haven't seen. Or does anyone
>> have suggestions as to an alternate procedure that will allow 2 slaves to
>> sync concurrently.
>>
> With some more testing I've done today I seem to have found an issue with
> this procedure.
> When the slave starts up after the sync It reaches what it thinks is a
> consistent recovery point very fast based on the pg_stop_backup
>
> eg:
> (from the recover script)
> 2012-09-19 12:15:02: pgsql_start start
> 2012-09-19 12:15:31: pg_start_backup
> 2012-09-19 12:15:31: -----------------
> 2012-09-19 12:15:31: 61/30000020
> 2012-09-19 12:15:31: (1 row)
> 2012-09-19 12:15:31:
> 2012-09-19 12:15:32: NOTICE: pg_stop_backup complete, all required WAL
> segments have been archived
> 2012-09-19 12:15:32: pg_stop_backup
> 2012-09-19 12:15:32: ----------------
> 2012-09-19 12:15:32: 61/300000D8
> 2012-09-19 12:15:32: (1 row)
> 2012-09-19 12:15:32:
>
> While the sync was running (but after the pg_stop_backup) I pushed a bunch
> of traffic against the master server. Which got me to a current xlog
> location of
> postgres=# select pg_current_xlog_location();
> pg_current_xlog_location
> --------------------------
> 61/6834C450
> (1 row)
>
> The startup of the slave after the sync completed:
> 2012-09-19 12:42:49.976 MDT [18791]: [1-1] LOG: database system was
> interrupted; last known up at 2012-09-19 12:15:31 MDT
> 2012-09-19 12:42:49.976 MDT [18791]: [2-1] LOG: creating missing WAL
> directory "pg_xlog/archive_status"
> 2012-09-19 12:42:50.143 MDT [18791]: [3-1] LOG: entering standby mode
> 2012-09-19 12:42:50.173 MDT [18792]: [1-1] LOG: streaming replication
> successfully connected to primary
> 2012-09-19 12:42:50.487 MDT [18791]: [4-1] LOG: redo starts at 61/30000020
> 2012-09-19 12:42:50.495 MDT [18791]: [5-1] LOG: consistent recovery state
> reached at 61/31000000
> 2012-09-19 12:42:50.495 MDT [18767]: [2-1] LOG: database system is ready to
> accept read only connections
>
> It shows the DB reached a consistent state as of 61/31000000 which is well
> behind the current location of the master (and the data files that were
> synced over to the slave). And monitoring the server showed the expected
> slave delay that disappeared as the slave pulled and recovered from the WAL
> files that go generated after the pg_stop_backup.
>
> But based on this it looks like this procedure would end up with a
> indeterminate amount of time (based on how much traffic the master processed
> while the slave was syncing) that the slave couldn't be trusted for fail
> over or querying as the server is up and running but is not actually in a
> consistent state.
>
> Thinking it through the more complicated script version of the 2 server
> recovery (where first past the post to run start_backup or stop_backup)
> would also have this issue (although our failover slave would always be the
> one running stop backup as it syncs faster so at least it would be always
> consistent but the DR would still have the problem)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Roest 2012-09-19 19:51:48 Re: initial sync of multiple streaming slaves simultaneously
Previous Message Mike Roest 2012-09-19 19:27:01 Re: initial sync of multiple streaming slaves simultaneously