Re: Eliminate redundant tuple visibility check in vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-09-07 19:09:54
Message-ID: CAAKRu_a9RSogxOSdT4Zjv64h2fJmeBcuM8TRLw1zOzOwFCWucg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 7, 2023 at 1:37 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Yeah, I think this is a fair concern. I have addressed it in the
> > attached patches.
> >
> > I thought a lot about whether or not adding a PruneResult which
> > contains only the output parameters and result of heap_page_prune() is
> > annoying since we have so many other *Prune* data structures. I
> > decided it's not annoying. In some cases, four outputs don't merit a
> > new structure. In this case, I think it declutters the code a bit --
> > independent of any other patches I may be writing :)
>
> I took a look at 0001 and I think that it's incorrect. In the existing
> code, *off_loc gets updated before each call to
> heap_prune_satisfies_vacuum(). This means that if an error occurs in
> heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
> the relevant offset number. In your version, the relevant offset
> number will only be stored in some local structure to which the caller
> doesn't yet have access. The difference is meaningful. lazy_scan_prune
> passes off_loc as vacrel->offnum, which means that if an error
> happens, vacrel->offnum will have the right value, and so when the
> error context callback installed by heap_vacuum_rel() fires, namely
> vacuum_error_callback(), it can look at vacrel->offnum and know where
> the error happened. With your patch, I think that would no longer
> work.

You are right. That is a major problem. Thank you for finding that. I
was able to confirm the breakage by patching in an error to
heap_page_prune() after we have set off_loc and confirming that the
error context has the offset in master and is missing it with my patch
applied.

I can fix it by changing the type of PruneResult->off_loc to be an
OffsetNumber pointer. This does mean that PruneResult will be
initialized partially by heap_page_prune() callers. I wonder if you
think that undermines the case for making a new struct.

I still want to eliminate the NULL check of off_loc in
heap_page_prune() by making it a required parameter. Even though
on-access pruning does not have an error callback mechanism which uses
the offset, it seems better to have a pointless local variable in
heap_page_prune_opt() than to do a NULL check for every tuple pruned.

> I haven't run the regression suite with 0001 applied. I'm guessing
> that you did, and that they passed, which if true means that we don't
> have a test for this, which is a shame, although writing such a test
> might be a bit tricky. If there is a test case for this and you didn't
> run it, woops.

There is no test coverage for the vacuum error callback context
currently (tests passed for me). I looked into how we might add such a
test. First, I investigated what kind of errors might occur during
heap_prune_satisfies_vacuum(). Some of the multixact functions called
by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
GetMultiXactIdMembers(). It seems difficult to trigger the errors in
GetMultiXactIdMembers(), as we would have to cause wraparound. It
would be even more difficult to ensure that we hit those specific
errors from a call stack containing heap_prune_satisfies_vacuum(). As
such, I'm not sure I can think of a way to protect future developers
from repeating my mistake--apart from improving the comment like you
mentioned.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-09-07 19:14:50 Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Previous Message Robert Haas 2023-09-07 19:02:48 Re: GUC for temporarily disabling event triggers