Missing rows after logical replication in new primary

From: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Missing rows after logical replication in new primary
Date: 2020-12-18 08:36:53
Message-ID: CAMX1ThjXG-k7jVWy8QhC360gFn_xO+Nr6PPgDYQb5yeAMbZJFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

We migrated from postgres 11 to 12 using logical replication. Today we
noticed that one table is missing 1252 rows after the replication finished
and we flipped to the new primary (we still have the old so we can recover).

We see that these rows were inserted in the table after starting the
initial copy of the table. Most of the missing rows seem from new inserts
happening **during the initial copy** (1230) and the rest (22) from inserts
**during the period the replication ran** (7 days).

This table is a (for us) high volume table (> 400.000.000 rows), with daily
> 150.000 new inserts.

We took a per-table approach for the replication and this table was the
last table we started in our replication.

We did some sanity checks before we switched to the new master, like
comparing max(id) to see if the replica was up to date (including this
table) and counts on some tables and that all checked out okay.

So how can this happen? For now it seems that only this table suffered from
it, but we are pretty 'scared' more tables are affected, so we will have to
check them all.

Lars

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2020-12-18 09:06:03 Re: Unexpected result count from update statement on partitioned table
Previous Message Gustavsson Mikael 2020-12-18 08:17:03 SV: SV: SV: Problem with ssl and psql in Postgresql 13