Logical replication issue

From: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Logical replication issue
Date: 2019-11-19 00:09:03
Message-ID: CAODqTUYjdjrzBQE0pSH=iq7J=TKOigUGAomqacsuKgTB+VEYyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

we are using logical replication from PostgreSQL 10.8 (Ubuntu
10.8-1.pgdg14.04+1) to PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg18.04+1). Thre
are 1305 replicated tables within 3 schemas. The tables are from small
almost static lookup tables to larger tables up to ~40GB holding 44
millions of rows. In production, the replication works like a charm.

But we are experiencing an issue when we create clones of production
databases to the development environment. Development is in a separated
network, loaded from a testing data feed therefore development publisher
database contains data in almost identical volumes and the same structure,
but it is being continuously updated from the testing data feed.

To let all test work properly, replicated tables are truncated after
subscriber database is cloned to the development environment (all restored
subscriptions are dropped) and a new subscription to development publisher
database is created with copy_data = true option to ensure data
consistency.

Such an approach works well for more than a year using 2 of the present 3
schemas (most of the above-mentioned table count is in the long-used two
schemas). While we have added the third schema with only 7 tables from 35
MB to 40 GB in size, we are experiencing an initial synchronization issue.
The total volume of the three schemas is several hundreds of GB, so the
initial table synchronization lasts usually for several hours.
surprisingly, all of the 7 tables in the third schema, despite their size
or row count fail due to primary key violation, like - duplicate key value
violates unique constraint "party_pkey".
Such behavior makes me very surprised - based on the docs:
https://www.postgresql.org/docs/11/logical-replication-architecture.html,
<https://www.postgresql.org/docs/11/logical-replication-architecture.html>
31.5.1. Initial Snapshot, I expect that after some time necessary for the
table copy and afterward applied changes from WAL, there can't be a chance
for duplicate rows, since both the publisher and the subscriber are using
the same primary key as replica identity.

LOG: logical replication table synchronization worker for subscription
"sub_anl_dsos_usd", table "party" has started
ERROR: duplicate key value violates unique constraint "party_pkey"
DETAIL: Key (party_id)=(\x0003dd00131add0a) already exists.
CONTEXT: COPY party, line 1

If, once this happens (the table was truncated before the subscription was
created) I truncate the table on the subscriber, copy succeeded and from
that moment replication works like a charm. I'm not aware of using
replication different way then than for the two schemas we stared with and
for them, the synchronization including the application of changes
subsequent to subscription creation works without any issues.

Thanks for any hints on how to diagnose this issue, I can't find where I
did a mistake and I did not expect that this might be related to higher
change rate on publisher tables than the other tables.

Ales

Browse pgsql-general by date

  From Date Subject
Next Message keisuke kuroda 2019-11-19 01:42:38 Re: pg12 rc1 on CentOS8 depend python2
Previous Message Peter J. Holzer 2019-11-18 22:24:52 Re: REINDEX VERBOSE unknown option