Re: Combine Prune and Freeze records emitted by vacuum

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-25 13:01:58
Message-ID: 923484dc-07f2-403b-aece-b01e94b4a9d2@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24/03/2024 21:55, Melanie Plageman wrote:
> On Sat, Mar 23, 2024 at 01:09:30AM +0200, Heikki Linnakangas wrote:
>> On 20/03/2024 21:17, Melanie Plageman wrote:
>> There is another patch in the commitfest that touches this area: "Recording
>> whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning"
>> [1]. That actually goes in the opposite direction than this patch. That
>> patch wants to add more information, to show whether a record was emitted by
>> VACUUM or on-access pruning, while this patch makes the freezing and
>> VACUUM's 2nd phase records also look the same. We could easily add more
>> flags to xl_heap_prune to distinguish them. Or assign different xl_info code
>> for them, like that other patch proposed. But I don't think that needs to
>> block this patch, that can be added as a separate patch.
>>
>> [1] https://www.postgresql.org/message-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com
>
> I think it would be very helpful to distinguish amongst vacuum pass 1,
> 2, and on-access pruning. I often want to know what did most of the
> pruning -- and I could see also wanting to know if the first or second
> vacuum pass was responsible for removing the tuples. I agree it could be
> done separately, but it would be very helpful to have as soon as
> possible now that the record type will be the same for all three.

Ok, I used separate 'info' codes for records generated on on-access
pruning and vacuum's 1st and 2nd pass. Similar to Peter's patch on that
other thread, except that I didn't reserve the whole high bit for this,
but used three different 'info' codes. Freezing uses the same
XLOG_HEAP2_PRUNE_VACUUM_SCAN as the pruning in vacuum's 1st pass. You
can distinguish them based on whether the record has nfrozen > 0, and
with the rest of the patches, they will be merged anyway.

>> From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
>> Date: Fri, 22 Mar 2024 23:10:22 +0200
>> Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats
>>
>> /*
>> - * Handles XLOG_HEAP2_PRUNE record type.
>> - *
>> - * Acquires a full cleanup lock.
>> + * Replay XLOG_HEAP2_PRUNE_FREEZE record.
>> */
>> static void
>> -heap_xlog_prune(XLogReaderState *record)
>> +heap_xlog_prune_freeze(XLogReaderState *record)
>> {
>> XLogRecPtr lsn = record->EndRecPtr;
>> - xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record);
>> + char *ptr;
>> + xl_heap_prune *xlrec;
>> Buffer buffer;
>> RelFileLocator rlocator;
>> BlockNumber blkno;
>> XLogRedoAction action;
>>
>> XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
>> + ptr = XLogRecGetData(record);
>
> I don't love having two different pointers that alias each other and we
> don't know which one is for what. Perhaps we could memcpy xlrec like in
> my attached diff (log_updates.diff). It also might perform slightly
> better than accessing flags through a xl_heap_prune
> * -- since it wouldn't be doing pointer dereferencing.

Ok.

>> /*
>> - * We're about to remove tuples. In Hot Standby mode, ensure that there's
>> - * no queries running for which the removed tuples are still visible.
>> + * We will take an ordinary exclusive lock or a cleanup lock depending on
>> + * whether the XLHP_CLEANUP_LOCK flag is set. With an ordinary exclusive
>> + * lock, we better not be doing anything that requires moving existing
>> + * tuple data.
>> */
>> - if (InHotStandby)
>> - ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
>> - xlrec->isCatalogRel,
>> + Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 ||
>> + (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
>> +
>> + /*
>> + * We are about to remove and/or freeze tuples. In Hot Standby mode,
>> + * ensure that there are no queries running for which the removed tuples
>> + * are still visible or which still consider the frozen xids as running.
>> + * The conflict horizon XID comes after xl_heap_prune.
>> + */
>> + if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
>> + {
>
> My attached patch has a TODO here for the comment. It sticks out that
> the serialization and deserialization conditions are different for the
> snapshot conflict horizon. We don't deserialize if InHotStandby is
> false. That's still correct now because we don't write anything else to
> the main data chunk afterward. But, if someone were to add another
> member after snapshot_conflict_horizon, they would want to know to
> deserialize snapshot_conflict_horizon first even if InHotStandby is
> false.

Good point. Fixed so that 'snapshot_conflict_horizon' is deserialized
even if !InHotStandby. A memcpy is cheap, and is probably optimized away
by the compiler anyway.

>> + TransactionId snapshot_conflict_horizon;
>> +
>
> I would throw a comment in about the memcpy being required because the
> snapshot_conflict_horizon is in the buffer unaligned.

Added.

>> +/*
>> + * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
>> + * by cursor and any xl_heap_prune flags, deserialize the arrays of
>> + * OffsetNumbers contained in an XLOG_HEAP2_PRUNE_FREEZE record.
>> + *
>> + * This is in heapdesc.c so it can be shared between heap2_redo and heap2_desc
>> + * code, the latter of which is used in frontend (pg_waldump) code.
>> + */
>> +void
>> +heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
>> + int *nredirected, OffsetNumber **redirected,
>> + int *ndead, OffsetNumber **nowdead,
>> + int *nunused, OffsetNumber **nowunused,
>> + int *nplans, xlhp_freeze_plan **plans,
>> + OffsetNumber **frz_offsets)
>> +{
>> + if (flags & XLHP_HAS_FREEZE_PLANS)
>> + {
>> + xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor;
>> +
>> + *nplans = freeze_plans->nplans;
>> + Assert(*nplans > 0);
>> + *plans = freeze_plans->plans;
>> +
>> + cursor += offsetof(xlhp_freeze_plans, plans);
>> + cursor += sizeof(xlhp_freeze_plan) * *nplans;
>> + }
>
> I noticed you decided to set these in the else statements. Is that to
> emphasize that it is important to correctness that they be properly
> initialized?

The point was to always initialize *nplans et al in the function. You
required the caller to initialize them to zero, but that seems error-prone.

I made one more last minute change: I changed the order of the array
arguments in heap_xlog_deserialize_prune_and_freeze() to match the order
in log_heap_prune_and_freeze().

Committed with the above little changes. Thank you! Now, back to the
rest of the patches :-).

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2024-03-25 13:02:01 Re: speed up a logical replica setup
Previous Message Peter Eisentraut 2024-03-25 12:55:24 Re: speed up a logical replica setup