Help with synchronous replication automation

From: Todd Nine <todd(dot)nine(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Help with synchronous replication automation
Date: 2016-10-13 21:39:19
Message-ID: CAHLkSErow+o0+a9DK+Qo+GasoRSJrzcZ50G07YWRev8ux1TyeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi all,
I'm in the process of writing an application/api that will perform
similar functionality to RDS on Kuberentes using Postgres. For our first
use case, I have the following setup.

Master (0) --(synchronous replica) -> Standby 1

--(synchronous replica) -> Standby 2

I have the following configuration in my postgresql.conf on the master.

synchronous_standby_names = '1,2'

And the following connections in standby 1 and standby 2, respectively.

standby_mode = on
primary_conninfo = 'host=postgres-toddtest-write port=5432 user=postgres
application_name=1'
trigger_file = '/tmp/postgresql.trigger.5432'
recovery_target_timeline=latest

standby_mode = on
primary_conninfo = 'host=postgres-toddtest-write port=5432 user=postgres
application_name=2'
trigger_file = '/tmp/postgresql.trigger.5432'
recovery_target_timeline=latest

Replication appears to be working. In my master, I see the following in my
logs.

LOG: database system is ready to accept connections
LOG: standby "1" is now the synchronous standby with priority 1
LOG: standby "2" is now the synchronous standby with priority 2
LOG: standby "2" is now the synchronous standby with priority 2
LOG: standby "1" is now the synchronous standby with priority 1

However, when I run the following on my slave nodes, the lag time seems
enormous.

SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;
time_lag
----------------
00:21:26.33019

As a result, I have a few questions I can't seem to find the answers to in
the documentation, any help would be greatly appreciated.

1) In this doc, it states ". If the standby is the first matching standby,
as specified in synchronous_standby_names on the primary, the reply
messages from that standby will be used to wake users waiting for
confirmation that the commit record has been received."

https://www.postgresql.org/docs/9.5/static/warm-standby.html

My understanding is that means if Standby 1 has successfully fsynced that
commit to disk, the server will return a response to the client as a
successful commit. What happens to Standby 2? I'm assuming it's sent the
same WAL entry asynchronously, but wanted to be sure.

2) How can I validate the replication latency in standbys that are NOT the
current hot standby? For instance, I want to deploy or upgrade the PG
nodes automatically. I would add more hot standbys, wait for them to catch
up. I would then fail over to one of them as the new master node. Once
this is complete and working, I'd remove the old master and replicas.

In order to automate 2), I need some way to verify that the standby that's
just been created is up to date, as well as receiving the latest traffic.

Thanks in advance!
Todd

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Poul Kristensen 2016-10-14 11:42:41 Re: PostgresSQL 9.5 and systemd autorestart but without replication.
Previous Message Joshua D. Drake 2016-10-13 16:10:47 Re: Distribution shipped vs Postgres Repository