Quick Links

Re: [MASSMAIL]Re: Help with synchronous replication automation

From:	"Gilberto Castillo" <gilberto(dot)castillo(at)etecsa(dot)cu>
To:	"Keith" <keith(at)keithf4(dot)com>
Cc:	"Todd Nine" <todd(dot)nine(at)gmail(dot)com>, "pgsql-admin" <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: [MASSMAIL]Re: Help with synchronous replication automation
Date:	2016-10-14 17:58:11
Message-ID:	59498.192.168.207.54.1476467891.squirrel@webmail.etecsa.cu
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

Use 9.6

With synchronous_commit = 'remote_apply'

> On Thu, Oct 13, 2016 at 5:39 PM, Todd Nine <todd(dot)nine(at)gmail(dot)com> wrote:
>
>> Hi all,
>> I'm in the process of writing an application/api that will perform
>> similar functionality to RDS on Kuberentes using Postgres. For our
>> first
>> use case, I have the following setup.
>>
>> Master (0) --(synchronous replica) -> Standby 1
>>
>> --(synchronous replica) -> Standby 2
>>
>>
>> I have the following configuration in my postgresql.conf on the master.
>>
>> synchronous_standby_names = '1,2'
>>
>> And the following connections in standby 1 and standby 2, respectively.
>>
>>
>> standby_mode = on
>> primary_conninfo = 'host=postgres-toddtest-write port=5432
>> user=postgres
>> application_name=1'
>> trigger_file = '/tmp/postgresql.trigger.5432'
>> recovery_target_timeline=latest
>>
>>
>> standby_mode = on
>> primary_conninfo = 'host=postgres-toddtest-write port=5432
>> user=postgres
>> application_name=2'
>> trigger_file = '/tmp/postgresql.trigger.5432'
>> recovery_target_timeline=latest
>>
>> Replication appears to be working. In my master, I see the following in
>> my logs.
>>
>> LOG: database system is ready to accept connections
>> LOG: standby "1" is now the synchronous standby with priority 1
>> LOG: standby "2" is now the synchronous standby with priority 2
>> LOG: standby "2" is now the synchronous standby with priority 2
>> LOG: standby "1" is now the synchronous standby with priority 1
>>
>>
>> However, when I run the following on my slave nodes, the lag time seems
>> enormous.
>>
>> SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;
>> time_lag
>> ----------------
>> 00:21:26.33019
>>
>> As a result, I have a few questions I can't seem to find the answers to
>> in
>> the documentation, any help would be greatly appreciated.
>>
>>
>> 1) In this doc, it states ". If the standby is the first matching
>> standby, as specified in synchronous_standby_names on the primary, the
>> reply messages from that standby will be used to wake users waiting for
>> confirmation that the commit record has been received."
>>
>> https://www.postgresql.org/docs/9.5/static/warm-standby.html
>>
>> My understanding is that means if Standby 1 has successfully fsynced
>> that
>> commit to disk, the server will return a response to the client as a
>> successful commit. What happens to Standby 2? I'm assuming it's sent
>> the
>> same WAL entry asynchronously, but wanted to be sure.
>>
>
> I believe you have this correct. I'd still get confirmation on this from
> someone else, though. As of 9.6, you also have the option of requiring a
> minimum number of synchronous standbys that must respond. Before 9.6, it
> just went down the list in order and the first one to respond was all that
> was necessary for the master to confirm the transaction.
>
> https://www.postgresql.org/docs/9.6/static/warm-standby.html#SYNCHRONOUS-REPLICATION
>
>
>>
>> 2) How can I validate the replication latency in standbys that are NOT
>> the
>> current hot standby? For instance, I want to deploy or upgrade the PG
>> nodes automatically. I would add more hot standbys, wait for them to
>> catch
>> up. I would then fail over to one of them as the new master node. Once
>> this is complete and working, I'd remove the old master and replicas.
>>
>> In order to automate 2), I need some way to verify that the standby
>> that's
>> just been created is up to date, as well as receiving the latest
>> traffic.
>>
>> Thanks in advance!
>> Todd
>>
>>
>>
> To more reliably get the status of the slaves, you'll want to query from
> the master for the byte lag vs querying the slave for the last transaction
> replay. If the master is getting no writes, then checking for replay on
> the
> slave will give a false report that it is falling behind simply because
> it's run no transactions. I've explained this in more detail and have some
> example queries on my blog
>
> https://www.keithf4.com/monitoring_streaming_slave_lag/
>
> It's still a good monitor to have, though, and I recommend monitoring both
> byte lag from the master and replay from the slave. Just set your
> monitoring alerts appropriately
>

--
Saludos,
Gilberto Castillo
ETECSA, La Habana, Cuba

In response to

Re: Help with synchronous replication automation at 2016-10-14 16:05:37 from Keith

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Poul Kristensen	2016-10-17 18:35:29	Re: PostgresSQL 9.5 and systemd autorestart but without replication.
Previous Message	Keith	2016-10-14 16:05:37	Re: Help with synchronous replication automation