From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Eliminate redundant tuple visibility check in vacuum |
Date: | 2023-09-01 08:47:10 |
Message-ID: | ed6011e9-551f-6156-25c7-7e5ff068fcf4@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 9/1/23 03:25, Peter Geoghegan wrote:
> On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>> Any inserting transaction which aborts after heap_page_prune()'s
>> visibility check will now be of no concern to lazy_scan_prune(). Since
>> we don't do the visibility check again, we won't find the tuple
>> HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
>> the array of tuples for vacuum to reap. This does mean that we won't
>> reap and remove tuples whose inserting transactions abort right after
>> heap_page_prune()'s visibility check. But, this doesn't seem like an
>> issue.
> It's definitely not an issue.
>
> The loop is essentially a hacky way of getting a consistent picture of
> which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
> need to be left behind (consistent at the level of each page and each
> HOT chain, at least). Making that explicit seems strictly better.
>
>> They may not be removed until the next vacuum, but ISTM it is
>> actually worse to pay the cost of re-pruning the whole page just to
>> clean up that one tuple. Maybe if that renders the page all visible
>> and we can mark it as such in the visibility map -- but that seems
>> like a relatively narrow use case.
> The chances of actually hitting the retry are microscopic anyway. It
> has nothing to do with making sure that dead tuples from aborted
> tuples get removed for its own sake, or anything. Rather, the retry is
> all about making sure that all TIDs that get removed from indexes can
> only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
> tuples with storage would very occasionally be left behind, which made
> life difficult in a bunch of other places -- for no good reason.
>
That makes sense and seems like a much better compromise. Thanks for the
explanations. Please update the comment to document the corner case and
how we handle it.
--
David Geier
(ServiceNow)
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-09-01 09:15:48 | Re: persist logical slots to disk during shutdown checkpoint |
Previous Message | Peter Smith | 2023-09-01 08:29:06 | Re: Synchronizing slots from primary to standby |