Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-24 19:55:22
Message-ID: 20240324195522.uv5xxwfdmvpjskxr@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 23, 2024 at 01:09:30AM +0200, Heikki Linnakangas wrote:
> On 20/03/2024 21:17, Melanie Plageman wrote:
> > Attached patch changes-for-0001.patch has a bunch of updated comments --
> > especially for heapam_xlog.h (including my promised diagram). And a few
> > suggestions (mostly changes that I should have made before).
>
> New version of these WAL format changes attached. Squashed to one patch now.
> I spent more time on the comments throughout the patch. And one
> notable code change: I replaced the XLHP_LP_TRUNCATE_ONLY flag with
> XLHP_CLEANUP_LOCK. XLHP_CLEANUP_LOCK directly indicates if you need a
> cleanup lock to replay the record. It must always be set when
> XLHP_HAS_REDIRECTIONS or XLHP_HAS_DEAD_ITEMS is set, because replaying those
> always needs a cleanup lock. That felt easier to document and understand
> than XLHP_LP_TRUNCATE_ONLY.

Makes sense to me.

> > > From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
> > > From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> > > Date: Wed, 20 Mar 2024 14:53:31 +0200
> > > Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()
> > >
> > > Mostly to make local variables more tightly-scoped.
> >
> > So, I don't think you can move those sub-records into the tighter scope.
> > If you run tests with this applied, you'll see it crashes and a lot of
> > the data in the record is wrong. If you move the sub-record declarations
> > out to a wider scope, the tests pass.
> >
> > The WAL record data isn't actually copied into the buffer until
> >
> > recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);
> >
> > after registering everything.
> > So all of those sub-records you made are out of scope the time it tries
> > to copy from them.
> >
> > I spent the better part of a day last week trying to figure out what was
> > happening after I did the exact same thing. I must say that I found the
> > xloginsert API incredibly unhelpful on this point.
>
> Oops. I had that in mind and that was actually why I moved the
> XLogRegisterData() call to the end of the function, because I found it
> confusing to register the struct before filling it in completely, even
> though it works perfectly fine. But then I missed it anyway when I moved the
> local variables. I added a brief comment on that.

Comment was a good idea.

> There is another patch in the commitfest that touches this area: "Recording
> whether Heap2/PRUNE records are from VACUUM or from opportunistic pruning"
> [1]. That actually goes in the opposite direction than this patch. That
> patch wants to add more information, to show whether a record was emitted by
> VACUUM or on-access pruning, while this patch makes the freezing and
> VACUUM's 2nd phase records also look the same. We could easily add more
> flags to xl_heap_prune to distinguish them. Or assign different xl_info code
> for them, like that other patch proposed. But I don't think that needs to
> block this patch, that can be added as a separate patch.
>
> [1] https://www.postgresql.org/message-id/CAH2-Wzmsevhox+HJpFmQgCxWWDgNzP0R9F+VBnpOK6TgxMPrRw@mail.gmail.com

I think it would be very helpful to distinguish amongst vacuum pass 1,
2, and on-access pruning. I often want to know what did most of the
pruning -- and I could see also wanting to know if the first or second
vacuum pass was responsible for removing the tuples. I agree it could be
done separately, but it would be very helpful to have as soon as
possible now that the record type will be the same for all three.

> From 042185d3de14dcb7088bbe50e9c64e365ac42c2a Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Fri, 22 Mar 2024 23:10:22 +0200
> Subject: [PATCH v6] Merge prune, freeze and vacuum WAL record formats
>
> /*
> - * Handles XLOG_HEAP2_PRUNE record type.
> - *
> - * Acquires a full cleanup lock.
> + * Replay XLOG_HEAP2_PRUNE_FREEZE record.
> */
> static void
> -heap_xlog_prune(XLogReaderState *record)
> +heap_xlog_prune_freeze(XLogReaderState *record)
> {
> XLogRecPtr lsn = record->EndRecPtr;
> - xl_heap_prune *xlrec = (xl_heap_prune *) XLogRecGetData(record);
> + char *ptr;
> + xl_heap_prune *xlrec;
> Buffer buffer;
> RelFileLocator rlocator;
> BlockNumber blkno;
> XLogRedoAction action;
>
> XLogRecGetBlockTag(record, 0, &rlocator, NULL, &blkno);
> + ptr = XLogRecGetData(record);

I don't love having two different pointers that alias each other and we
don't know which one is for what. Perhaps we could memcpy xlrec like in
my attached diff (log_updates.diff). It also might perform slightly
better than accessing flags through a xl_heap_prune
* -- since it wouldn't be doing pointer dereferencing.

> + xlrec = (xl_heap_prune *) ptr;
> + ptr += SizeOfHeapPrune;
>
> /*
> - * We're about to remove tuples. In Hot Standby mode, ensure that there's
> - * no queries running for which the removed tuples are still visible.
> + * We will take an ordinary exclusive lock or a cleanup lock depending on
> + * whether the XLHP_CLEANUP_LOCK flag is set. With an ordinary exclusive
> + * lock, we better not be doing anything that requires moving existing
> + * tuple data.
> */
> - if (InHotStandby)
> - ResolveRecoveryConflictWithSnapshot(xlrec->snapshotConflictHorizon,
> - xlrec->isCatalogRel,
> + Assert((xlrec->flags & XLHP_CLEANUP_LOCK) != 0 ||
> + (xlrec->flags & (XLHP_HAS_REDIRECTIONS | XLHP_HAS_DEAD_ITEMS)) == 0);
> +
> + /*
> + * We are about to remove and/or freeze tuples. In Hot Standby mode,
> + * ensure that there are no queries running for which the removed tuples
> + * are still visible or which still consider the frozen xids as running.
> + * The conflict horizon XID comes after xl_heap_prune.
> + */
> + if (InHotStandby && (xlrec->flags & XLHP_HAS_CONFLICT_HORIZON) != 0)
> + {

My attached patch has a TODO here for the comment. It sticks out that
the serialization and deserialization conditions are different for the
snapshot conflict horizon. We don't deserialize if InHotStandby is
false. That's still correct now because we don't write anything else to
the main data chunk afterward. But, if someone were to add another
member after snapshot_conflict_horizon, they would want to know to
deserialize snapshot_conflict_horizon first even if InHotStandby is
false.

> + TransactionId snapshot_conflict_horizon;
> +

I would throw a comment in about the memcpy being required because the
snapshot_conflict_horizon is in the buffer unaligned.

> + memcpy(&snapshot_conflict_horizon, ptr, sizeof(TransactionId));
> + ResolveRecoveryConflictWithSnapshot(snapshot_conflict_horizon,
> + (xlrec->flags & XLHP_IS_CATALOG_REL) != 0,
> rlocator);
> + }
>
> /*

> +
> +/*
> + * Given a MAXALIGNed buffer returned by XLogRecGetBlockData() and pointed to
> + * by cursor and any xl_heap_prune flags, deserialize the arrays of
> + * OffsetNumbers contained in an XLOG_HEAP2_PRUNE_FREEZE record.
> + *
> + * This is in heapdesc.c so it can be shared between heap2_redo and heap2_desc
> + * code, the latter of which is used in frontend (pg_waldump) code.
> + */
> +void
> +heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
> + int *nredirected, OffsetNumber **redirected,
> + int *ndead, OffsetNumber **nowdead,
> + int *nunused, OffsetNumber **nowunused,
> + int *nplans, xlhp_freeze_plan **plans,
> + OffsetNumber **frz_offsets)
> +{
> + if (flags & XLHP_HAS_FREEZE_PLANS)
> + {
> + xlhp_freeze_plans *freeze_plans = (xlhp_freeze_plans *) cursor;
> +
> + *nplans = freeze_plans->nplans;
> + Assert(*nplans > 0);
> + *plans = freeze_plans->plans;
> +
> + cursor += offsetof(xlhp_freeze_plans, plans);
> + cursor += sizeof(xlhp_freeze_plan) * *nplans;
> + }

I noticed you decided to set these in the else statements. Is that to
emphasize that it is important to correctness that they be properly
initialized?

> + else
> + {
> + *nplans = 0;
> + *plans = NULL;
> + }
> +

Thanks again for all your work on this set!

- Melanie

Attachment Content-Type Size
log_updates.diff text/x-diff 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-03-24 20:12:17 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Tom Lane 2024-03-24 18:32:06 Re: pg_dump versus enum types, round N+1