Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-20 19:17:10
Message-ID: 20240320191710.y6b4hterii5qf76y@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2024 at 03:15:49PM +0200, Heikki Linnakangas wrote:
> On 20/03/2024 03:36, Melanie Plageman wrote:
> > On Mon, Mar 18, 2024 at 01:15:21AM +0200, Heikki Linnakangas wrote:
> > > On 15/03/2024 02:56, Melanie Plageman wrote:
> > > > Okay, so I was going to start using xl_heap_prune for vacuum here too,
> > > > but I realized it would be bigger because of the
> > > > snapshotConflictHorizon. Do you think there is a non-terrible way to
> > > > make the snapshotConflictHorizon optional? Like with a flag?
> > >
> > > Yeah, another flag would do the trick.
> >
> > Okay, I've done this in attached v4 (including removing
> > XLOG_HEAP2_VACUUM). I had to put the snapshot conflict horizon in the
> > "main chunk" of data available at replay regardless of whether or not
> > the record ended up including an FPI.
> >
> > I made it its own sub-record (xlhp_conflict_horizon) less to help with
> > alignment (though we can use all the help we can get there) and more to
> > keep it from getting lost. When you look at heapam_xlog.h, you can see
> > what a XLOG_HEAP2_PRUNE record will contain starting with the
> > xl_heap_prune struct and then all the sub-record types.
>
> Ok, now that I look at this, I wonder if we're being overly cautious about
> the WAL size. We probably could just always include the snapshot field, and
> set it to InvalidTransactionId and waste 4 bytes when it's not needed. For
> the sake of simplicity. I don't feel strongly either way though, the flag is
> pretty simple too.

That will mean that all vacuum records are at least 3 bytes bigger than
before -- which makes it somewhat less defensible to get rid of
xl_heap_vacuum.

That being said, I ended up doing an unaligned access when I
packed it and made it optional, so maybe it is less user-friendly.

But I also think that making it optional is more clear for vacuum which
will never use it.

> I realized that the WAL record format changes are pretty independent from
> the rest of the patches. They could be applied before the rest. Without the
> rest of the changes, we'll still write two WAL records per page in vacuum,
> one to prune and another one to freeze, but it's another meaningful
> incremental step. So I reshuffled the patches, so that the WAL format is
> changed first, before the rest of the changes.

Ah, great idea! That eliminates the issue of preliminary commits having
larger WAL records that then get streamlined.

> 0001-0008: These are the WAL format changes. There's some comment cleanup
> needed, but as far as the code goes, I think these are pretty much ready to
> be squashed & committed.

My review in this email is *only* for 0001-0008. I have not looked at
the rest yet.

> From 06d5ff5349a8aa95cbfd06a8043fe503b7b1bf7b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 14:50:14 +0200
> Subject: [PATCH v5 01/26] Merge prune, freeze and vacuum WAL record formats
>
> The new combined WAL record is now used for pruning, freezing and 2nd
> pass of vacuum. This is in preparation of changing vacuuming to write
> a combined prune+freeze record per page, instead of separate two
> records. The new WAL record format now supports that, but the code
> still always writes separate records for pruning and freezing.

Attached patch changes-for-0001.patch has a bunch of updated comments --
especially for heapam_xlog.h (including my promised diagram). And a few
suggestions (mostly changes that I should have made before).

>
> XXX I tried to lift-and-shift the code from v4 patch set as unchanged
> as possible, for easier review, but some noteworthy changes:

In the final commit message, I think it is worth calling out that all of
those freeze logging functions heap_log_freeze_eq/cmp/etc are lifted and
shifted from one file to another. When I am reviewing a big diff, it is
always helpful to know where I need to review line-by-line.

>
> - Instead of passing PruneState and PageFreezeResult to
> log_heap_prune_and_freeze(), pass the arrays of frozen, redirected
> et al offsets directly. That way, it can be called from other places.

good idea.

> - moved heap_xlog_deserialize_prune_and_freeze() from xactdesc.c to
> heapdesc.c. (Because that's clearly where it belongs)

:)

> From cd6cdaebb362b014733e99ecd868896caf0fb3aa Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 13:45:01 +0200
> Subject: [PATCH v5 02/26] Keep the original numbers for existing WAL records
>
> Doesn't matter much because the WAL format is not compatible across
> major versions anyway. But still seems nice to keep the identifiers
> unchanged when we can. (There's some precedence for this if you search
> the git history for "is free, was").

sounds good.

> From d3207bb557aa1d2868a50d357a06318a6c0cb5cd Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 13:48:29 +0200
> Subject: [PATCH v5 03/26] Rename record to XLOG_HEAP2_PRUNE_FREEZE
>
> To clarify that it also freezes now, and to make it clear that it's
> significantly different from the old XLOG_HEAP2_PRUNE format.

+1

> From 5d6fc2ffbdd839e0b69242af16446a46cf6a2dc7 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 13:49:59 +0200
> Subject: [PATCH v5 04/26] 'nplans' is a pointer
>
> I'm surprised the compiler didn't warn about this

oops. while looking at this, I noticed that the asserts I added that
nredirected > 0, ndead > 0, and nunused > 0 have the same problem.

> ---
> src/backend/access/rmgrdesc/heapdesc.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
> index 8b94c869faf..9ef8a745982 100644
> --- a/src/backend/access/rmgrdesc/heapdesc.c
> +++ b/src/backend/access/rmgrdesc/heapdesc.c
> @@ -155,8 +155,7 @@ heap_xlog_deserialize_prune_and_freeze(char *cursor, uint8 flags,
> cursor += sizeof(OffsetNumber) * *nunused;
> }
>
> - if (nplans > 0)

> From 59f3f80f82ed7a63d86c991d0cb025e4cde2caec Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 13:36:41 +0200
> Subject: [PATCH v5 06/26] Fix logging snapshot conflict horizon.
>
> - it was accessed without proper alignment, which won't work on
> architectures that are strict about alignment. Use memcpy.

wow, oops. thanks for fixing this!

> - in heap_xlog_prune_freeze, the code tried to access the xid with
> "(xlhp_conflict_horizon *) (xlrec + SizeOfHeapPrune);" But 'xlrec'
> was "xl_heap_prune *" rather than "char *". That happened to work,
> because sizeof(xl_heap_prune) == 1, but make it more robust by
> adding a cast to char *.

good catch.

> - remove xlhp_conflict_horizon and store a TransactionId directly. A
> separate struct would make sense if we needed to store anything else
> there, but for now it just seems like more code.

makes sense. I just want to make sure heapam_xlog.h makes it extra clear
that this is happening. I see your comment addition. If you combine it
with my comment additions in the attached patch for 0001, hopefully that
makes it clear enough.

> From 8af186ee9dd8c7dc20f37a69b34cab7b95faa43b Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 14:03:06 +0200
> Subject: [PATCH v5 07/26] Add comment to log_heap_prune_and_freeze().
>
> XXX: This should be rewritten, but I tried to at least list some
> important points.

Are you thinking that it needs to mention more things or that the things
it mentions need more detail?

> From b26e36ba8614d907a6e15810ed4f684f8f628dd2 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Wed, 20 Mar 2024 14:53:31 +0200
> Subject: [PATCH v5 08/26] minor refactoring in log_heap_prune_and_freeze()
>
> Mostly to make local variables more tightly-scoped.

So, I don't think you can move those sub-records into the tighter scope.
If you run tests with this applied, you'll see it crashes and a lot of
the data in the record is wrong. If you move the sub-record declarations
out to a wider scope, the tests pass.

The WAL record data isn't actually copied into the buffer until

recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE_FREEZE);

after registering everything.
So all of those sub-records you made are out of scope the time it tries
to copy from them.

I spent the better part of a day last week trying to figure out what was
happening after I did the exact same thing. I must say that I found the
xloginsert API incredibly unhelpful on this point.

I would like to review the rest of the suggested changes in this patch
after we fix the issue I just mentioned.

- Melanie

Attachment Content-Type Size
changes-for-0001.patch text/x-diff 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-03-20 19:18:39 Re: Regression tests fail with musl libc because libpq.so can't be loaded
Previous Message Magnus Hagander 2024-03-20 19:16:52 Re: Possibility to disable `ALTER SYSTEM`