Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Date: 2024-11-07 18:39:55
Message-ID: 8f334886-d69a-4e68-acb5-961390f3ed44@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I kept investigating this, but I haven't made much progress. I still
don't understand why would it be OK to move any of the LSN fields
backwards - certainly for fields like confirm_flush or restart_lsn.

I did a simple experiment - added asserts to the couple places in
logical.c updating the the LSN fields, checking the value is increased.
But then I simply ran make check-world, instead of the stress test.

And that actually fails too, 040_standby_failover_slots_sync.pl triggers
this

{
SpinLockAcquire(&MyReplicationSlot->mutex);
Assert(MyReplicationSlot->data.confirmed_flush <= lsn);
MyReplicationSlot->data.confirmed_flush = lsn;
SpinLockRelease(&MyReplicationSlot->mutex);
}

So this moves confirm_flush back, albeit only by a tiny amount (I've
seen ~56 byte difference). I don't have an example of this causing an
issue in practice, but I note that CheckPointReplicationSlots does this:

if (is_shutdown && SlotIsLogical(s))
{
SpinLockAcquire(&s->mutex);

if (s->data.invalidated == RS_INVAL_NONE &&
s->data.confirmed_flush > s->last_saved_confirmed_flush)
{
s->just_dirtied = true;
s->dirty = true;
}
SpinLockRelease(&s->mutex);
}

to determine if a slot needs to be flushed to disk during checkpoint. So
I guess it's possible we save a slot to disk at some LSN, then the
confirm_flush moves backward, and we fail to sync the slot to disk.

But I don't have a reproducer for this ...

I also noticed a strange difference between LogicalIncreaseXminForSlot
and LogicalIncreaseRestartDecodingForSlot.

The structure of LogicalIncreaseXminForSlot looks like this:

if (TransactionIdPrecedesOrEquals(xmin, slot->data.catalog_xmin))
{
}
else if (current_lsn <= slot->data.confirmed_flush)
{
... update candidate fields ...
}
else if (slot->candidate_xmin_lsn == InvalidXLogRecPtr)
{
... update candidate fields ...
}

while LogicalIncreaseRestartDecodingForSlot looks like this:

if (restart_lsn <= slot->data.restart_lsn)
{
}
else if (current_lsn <= slot->data.confirmed_flush)
{
... update candidate fields ...
}

if (slot->candidate_restart_valid == InvalidXLogRecPtr)
{
... update candidate fields ...
}

Notice that LogicalIncreaseXminForSlot has the third block guarded by
"else if", while LogicalIncreaseRestartDecodingForSlot has "if". Isn't
that a bit suspicious, considering the functions do the same thing, just
for different fields? I don't know if this is dangerous, the comments
suggest it may just waste extra effort after reconnect.

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-11-07 18:44:25 Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message Alvaro Herrera 2024-11-07 18:38:13 Re: Enable data checksums by default