Replicate consistent snapshot to a second PG instance using FDWs

From: Thomas Michael Engelke <thomas(dot)engelke(at)posteo(dot)de>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Replicate consistent snapshot to a second PG instance using FDWs
Date: 2022-01-05 14:12:14
Message-ID: d6d16850-f592-97e8-9c16-963de455ff9e@posteo.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


We have a setup where we run 2 locations with both locations containing
a full HA setup (using Patroni, etcd, Consul). Each location has 2 PG
servers + 1 witness node. At certain time intervals or on demand, our
customer would want to send the data from one location to the other over
the wire, either in full (overwriting the existing data) or selected
tables/rows (incrementally).

To simplify the used technology I am thinking of implementing all of
that in PostgreSQL itself; the alternative would be to code something,
probably in Python, that executes pg_dump/pg_restore using configuration
created specifically for the use case.

My idea is to execute everything in SQL, using fdw to create fdw tables
for each table that needs to be transferred (on the target server B1):

DROP TABLE tablename;
CREATE TABLE tablename_fdw SERVER A1;
CREATE TABLE tablename LIKE tablename_fdw INCLUDING ALL;

Then just pull the data over:

INSERT INTO tablename SELECT * FROM tablename_fdw;

Please ignore obvious possible performance optimizations.

However, the copy of the data needs to be consistent, so I would need to
have to copy a snapshot over. For a direct connection all is well
understood, but working with remote tables using fdw here not all is
well understood (by me).

I found a talk from PGCon 2020 from 2nd Quadrant where they state:

"Open a foreign transaction when FDW access the remote first time within
the local transaction"
"Foreign transaction uses SERIALIZABLE when the local transaction has
SERIALIZABLE, otherwise use REPEATABLE READ"
"This ensures that if a query performs multiple table scans on the
remote server, it will get snapshot-consistent results for all the scans"

Is this what I am looking for? Can I reliable query one table after the
other over multiple fdw tables from the same server and get snapshot
consistency in the same way a connection to one instance would grant me?

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-01-05 14:54:25 Re: How best to create and use associative array type in Postgres?
Previous Message Shaozhong SHI 2022-01-05 13:36:19 Re: Create and access a dictionary type