Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Date: 2024-06-24 08:10:38
Message-ID: 082cad56-a0f7-46b9-b850-52f242907983@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Melanie! I'm glad to hear you that you have found a root case of the
problem) Thank you for that!

On 21.06.2024 02:42, Melanie Plageman wrote:
> Hi,
>
> If vacuum fails to remove a tuple with xmax older than
> VacuumCutoffs->OldestXmin and younger than
> GlobalVisState->maybe_needed, it will ERROR out when determining
> whether or not to freeze the tuple with "cannot freeze committed
> xmax".
>
> In back branches starting with 14, failing to remove tuples older than
> OldestXmin during pruning caused vacuum to infinitely loop in
> lazy_scan_prune(), as investigated on this [1] thread.
>
> On master, after 1ccc1e05ae removed the retry loop in
> lazy_scan_prune() and stopped comparing tuples to OldestXmin, the hang
> could no longer happen, but we can still attempt to freeze dead tuples
> visibly killed before OldestXmin -- resulting in an ERROR.
>
> Pruning may fail to remove dead tuples with xmax before OldestXmin if
> the tuple is not considered removable by GlobalVisState.
>
> For vacuum, the GlobalVisState is initially calculated at the
> beginning of vacuuming the relation -- at the same time and with the
> same value as VacuumCutoffs->OldestXmin.
>
> A backend's GlobalVisState may be updated again when it is accessed if
> a new snapshot is taken or if something caused ComputeXidHorizons() to
> be called.
>
> This can happen, for example, at the end of a round of index vacuuming
> when GetOldestNonRemovableTransactionId() is called.
>
> Normally this may result in GlobalVisState's horizon moving forward --
> potentially allowing more dead tuples to be removed.
>
> However, if a disconnected standby with a running transaction older
> than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum
> initially calculates GlobalVisState and OldestXmin but before
> GlobalVisState is updated, the value of GlobalVisState->maybe_needed
> could go backwards.
>
> If this happens in the middle of vacuum's first pruning and freezing
> pass, it is possible that pruning/freezing could then encounter a
> tuple whose xmax is younger than GlobalVisState->maybe_needed and
> older than VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum()
> would deem the tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it.
> But the heap_pre_freeze_checks() would ERROR out with "cannot freeze
> committed xmax". This check is to avoid freezing dead tuples.
>
> We can fix this by always removing tuples considered dead before
> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby
> has a transaction that sees that tuple as alive, because it will
> simply wait to replay the removal until it would be correct to do so
> or recovery conflict handling will cancel the transaction that sees
> the tuple as alive and allow replay to continue.

Thisis an interestinganddifficultcase)Inoticedthatwheninitializingthe
cluster,inmyopinion,we provideexcessivefreezing.Initializationtakesa
longtime,whichcanlead,for example,tolongertestexecution.Igot ridofthisby
addingthe OldestMxact checkboxisnotFirstMultiXactId,anditworksfine.

if (prstate->cutoffs &&
TransactionIdIsValid(prstate->cutoffs->OldestXmin) &&
prstate->cutoffs->OldestMxact != FirstMultiXactId &&
NormalTransactionIdPrecedes(dead_after, prstate->cutoffs->OldestXmin))
    return HEAPTUPLE_DEAD;

CanI keepit?

> Attached is the suggested fix for master plus a repro. I wrote it as a
> recovery suite TAP test, but I am _not_ proposing we add it to the
> ongoing test suite. It is, amongst other things, definitely prone to
> flaking. I also had to use loads of data to force two index vacuuming
> passes now that we have TIDStore, so it is a slow test.
>
> If you want to run the repro with meson, you'll have to add
> 't/099_vacuum_hang.pl' to src/test/recovery/meson.build and then run it with:
>
> meson test postgresql:recovery / recovery/099_vacuum_hang
>
> If you use autotools, you can run it with:
> make check PROVE_TESTS="t/099_vacuum_hang.pl
>
> The repro forces a round of index vacuuming after the standby
> reconnects and before pruning a dead tuple whose xmax is older than
> OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.
>
> Then vacuum's first pass will continue with pruning and find our later
> inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to
> maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin.
>
> I make sure that the standby reconnects between vacuum_get_cutoffs()
> and pruning because I have a cursor on the page keeping VACUUM FREEZE
> from getting a cleanup lock.
>
> See the repro for step-by-step explanations of how it works.
>
> I have a modified version of this that repros the infinite loop on
> 14-16 with substantially less data. See it here [2]. Also, the repro
> attached to this mail won't work on 14 and 15 because of changes to
> background_psql.
>
> [1]https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee
> [2]https://www.postgresql.org/message-id/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com
To be honest, the meson test is new for me, but I see its useful
features. I think I will use it for checking my features)

I couldn't understand why the replica is necessary here. Now I am
digging why I got the similar behavior without replica when I have only
one instance. I'm stillcheckingthisinmytest,butI
believethispatchfixesthe originalproblembecausethesymptomswerethesame.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-06-24 08:17:02 Re: Conflict Detection and Resolution
Previous Message vignesh C 2024-06-24 07:27:09 Re: Logical Replication of sequences