From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 |
Date: | 2024-11-07 18:39:55 |
Message-ID: | 8f334886-d69a-4e68-acb5-961390f3ed44@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I kept investigating this, but I haven't made much progress. I still
don't understand why would it be OK to move any of the LSN fields
backwards - certainly for fields like confirm_flush or restart_lsn.
I did a simple experiment - added asserts to the couple places in
logical.c updating the the LSN fields, checking the value is increased.
But then I simply ran make check-world, instead of the stress test.
And that actually fails too, 040_standby_failover_slots_sync.pl triggers
this
{
SpinLockAcquire(&MyReplicationSlot->mutex);
Assert(MyReplicationSlot->data.confirmed_flush <= lsn);
MyReplicationSlot->data.confirmed_flush = lsn;
SpinLockRelease(&MyReplicationSlot->mutex);
}
So this moves confirm_flush back, albeit only by a tiny amount (I've
seen ~56 byte difference). I don't have an example of this causing an
issue in practice, but I note that CheckPointReplicationSlots does this:
if (is_shutdown && SlotIsLogical(s))
{
SpinLockAcquire(&s->mutex);
if (s->data.invalidated == RS_INVAL_NONE &&
s->data.confirmed_flush > s->last_saved_confirmed_flush)
{
s->just_dirtied = true;
s->dirty = true;
}
SpinLockRelease(&s->mutex);
}
to determine if a slot needs to be flushed to disk during checkpoint. So
I guess it's possible we save a slot to disk at some LSN, then the
confirm_flush moves backward, and we fail to sync the slot to disk.
But I don't have a reproducer for this ...
I also noticed a strange difference between LogicalIncreaseXminForSlot
and LogicalIncreaseRestartDecodingForSlot.
The structure of LogicalIncreaseXminForSlot looks like this:
if (TransactionIdPrecedesOrEquals(xmin, slot->data.catalog_xmin))
{
}
else if (current_lsn <= slot->data.confirmed_flush)
{
... update candidate fields ...
}
else if (slot->candidate_xmin_lsn == InvalidXLogRecPtr)
{
... update candidate fields ...
}
while LogicalIncreaseRestartDecodingForSlot looks like this:
if (restart_lsn <= slot->data.restart_lsn)
{
}
else if (current_lsn <= slot->data.confirmed_flush)
{
... update candidate fields ...
}
if (slot->candidate_restart_valid == InvalidXLogRecPtr)
{
... update candidate fields ...
}
Notice that LogicalIncreaseXminForSlot has the third block guarded by
"else if", while LogicalIncreaseRestartDecodingForSlot has "if". Isn't
that a bit suspicious, considering the functions do the same thing, just
for different fields? I don't know if this is dangerous, the comments
suggest it may just waste extra effort after reconnect.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Jacob Champion | 2024-11-07 18:44:25 | Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible |
Previous Message | Alvaro Herrera | 2024-11-07 18:38:13 | Re: Enable data checksums by default |