Logical replication & oldest XID.

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Logical replication & oldest XID.
Date: 2016-05-31 14:52:46
Message-ID: 574DA53E.1010806@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We are using logical replication in multimaster and are faced with some
interesting problem with "frozen" procArray->replication_slot_xmin.
This variable is adjusted by ProcArraySetReplicationSlotXmin which is
invoked by ReplicationSlotsComputeRequiredXmin, which
is in turn is called by LogicalConfirmReceivedLocation. If transactions
are executed at all nodes of multimaster, then everything works fine:
replication_slot_xmin is advanced. But if we send transactions only to
one multimaster node and broadcast this changes to other nodes, then no
data is send through replications slot at this nodes. No data sends - no
confirmations, LogicalConfirmReceivedLocation is not called and
procArray->replication_slot_xmin preserves original value 599.

As a result GetOldestXmin function always returns 599, so autovacuum is
actually blocked and our multimaster is not able to perform cleanup of
XID->CSN map, which cause shared memory overflow. This situation happens
only when write transactions are sent only to one node or if there are
no write transactions at all.

Before implementing some workaround (for example forces all of
ReplicationSlotsComputeRequiredXmin), I want to understand if it is real
problem of logical replication or we are doing something wrong? BDR
should be faced with the same problem if all updates are performed from
one node...

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-05-31 15:03:12 Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Previous Message Tomas Vondra 2016-05-31 13:54:32 Re: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system