Quick Links

Re: long-standing data loss bug in initial sync of logical replication

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: long-standing data loss bug in initial sync of logical replication
Date:	2023-11-18 10:56:47
Message-ID:	49958b83-009a-9360-2969-ea42cd15fff7@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11/18/23 03:54, Andres Freund wrote:
> Hi,
>
> On 2023-11-17 17:54:43 -0800, Andres Freund wrote:
>> On 2023-11-17 15:36:25 +0100, Tomas Vondra wrote:
>>> Overall, this looks, walks and quacks like a cache invalidation issue,
>>> likely a missing invalidation somewhere in the ALTER PUBLICATION code.
>
> I can confirm that something is broken with invalidation handling.
>
> To test this I just used pg_recvlogical to stdout. It's just interesting
> whether something arrives, that's easy to discern even with binary output.
>
> CREATE PUBLICATION pb;
> src/bin/pg_basebackup/pg_recvlogical --plugin=pgoutput --start --slot test -d postgres -o proto_version=4 -o publication_names=pb -o messages=true -f -
>
> S1: CREATE TABLE d(data text not null);
> S1: INSERT INTO d VALUES('d1');
> S2: BEGIN; INSERT INTO d VALUES('d2');
> S1: ALTER PUBLICATION pb ADD TABLE d;
> S2: COMMIT
> S2: INSERT INTO d VALUES('d3');
> S1: INSERT INTO d VALUES('d4');
> RL: <nothing>
>
> Without the 'd2' insert in an in-progress transaction, pgoutput *does* react
> to the ALTER PUBLICATION.
>
> I think the problem here is insufficient locking. The ALTER PUBLICATION pb ADD
> TABLE d basically modifies the catalog state of 'd', without a lock preventing
> other sessions from having a valid cache entry that they could continue to
> use. Due to this, decoding S2's transactions that started before S2's commit,
> will populate the cache entry with the state as of the time of S1's last
> action, i.e. no need to output the change.
>
> The reason this can happen is because OpenTableList() uses
> ShareUpdateExclusiveLock. That allows the ALTER PUBLICATION to happen while
> there's an ongoing INSERT.
>

I guess this would also explain why changing the lock mode from
ShareUpdateExclusiveLock to ShareRowExclusiveLock changes the behavior.
INSERT acquires RowExclusiveLock, which doesn't conflict only with the
latter.

> I think this isn't just a logical decoding issue. S2's cache state just after
> the ALTER PUBLICATION is going to be wrong - the table is already locked,
> therefore further operations on the table don't trigger cache invalidation
> processing - but the catalog state *has* changed. It's a bigger problem for
> logical decoding though, as it's a bit more lazy about invalidation processing
> than normal transactions, allowing the problem to persist for longer.
>

Yeah. I'm wondering if there's some other operation acquiring a lock
weaker than RowExclusiveLock that might be affected by this. Because
then we'd need to get an even stronger lock ...

>
> I guess it's not really feasible to just increase the lock level here though
> :(. The use of ShareUpdateExclusiveLock isn't new, and suddenly using AEL
> would perhaps lead to new deadlocks and such? But it also seems quite wrong.
>

If this really is about the lock being too weak, then I don't see why
would it be wrong? If it's required for correctness, it's not really
wrong, IMO. Sure, stronger locks are not great ...

I'm not sure about the risk of deadlocks. If you do

ALTER PUBLICATION ... ADD TABLE

it's not holding many other locks. It essentially gets a lock just a
lock on pg_publication catalog, and then the publication row. That's it.

If we increase the locks from ShareUpdateExclusive to ShareRowExclusive,
we're making it conflict with RowExclusive. Which is just DML, and I
think we need to do that.

So maybe that's fine? For me, a detected deadlock is better than
silently missing some of the data.

>
> We could brute force this in the logical decoding infrastructure, by
> distributing invalidations from catalog modifying transactions to all
> concurrent in-progress transactions (like already done for historic catalog
> snapshot, c.f. SnapBuildDistributeNewCatalogSnapshot()). But I think that'd
> be a fairly significant increase in overhead.
>

I have no idea what the overhead would be - perhaps not too bad,
considering catalog changes are not too common (I'm sure there are
extreme cases). And maybe we could even restrict this only to
"interesting" catalogs, or something like that? (However I hate those
weird differences in behavior, it can easily lead to bugs.)

But it feels more like a band-aid than actually fixing the issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: long-standing data loss bug in initial sync of logical replication at 2023-11-18 02:54:45 from Andres Freund

Responses

Re: long-standing data loss bug in initial sync of logical replication at 2023-11-18 18:12:57 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2023-11-18 11:24:55	Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)
Previous Message	Amit Kapila	2023-11-18 10:45:57	Re: Synchronizing slots from primary to standby