From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Subject: | Re: Combine Prune and Freeze records emitted by vacuum |
Date: | 2024-03-30 16:10:12 |
Message-ID: | CAAKRu_abm2tHhrc0QSQa==sHe=VA1=oz1dJMQYUOKuHmu+9Xrg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 30, 2024 at 8:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I think that we are actually successfully removing more RECENTLY_DEAD
> > HOT tuples than in master with heap_page_prune()'s new approach, and I
> > think it is correct; but let me know if I am missing something.
>
> /me blinks.
>
> Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?
At the top of the comment for heap_prune_chain() in master, it says
* If the item is an index-referenced tuple (i.e. not a heap-only tuple),
* the HOT chain is pruned by removing all DEAD tuples at the start of the HOT
* chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple.
* This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really
* DEAD, our visibility test is just too coarse to detect it.
Heikki had added a comment in one of his patches to the fast path for
HOT tuples at the top of heap_prune_chain():
* Note that we might first arrive at a dead heap-only tuple
* either while following a chain or here (in the fast
path). Whichever path
* gets there first will mark the tuple unused.
*
* Whether we arrive at the dead HOT tuple first here or while
* following a chain above affects whether preceding RECENTLY_DEAD
* tuples in the chain can be removed or not. Imagine that you
* have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we
* reach the RECENTLY_DEAD tuple first, the chain-following logic
* will find the DEAD tuple and conclude that both tuples are in
* fact dead and can be removed. But if we reach the DEAD tuple
* at the end of the chain first, when we reach the RECENTLY_DEAD
* tuple later, we will not follow the chain because the DEAD
* TUPLE is already 'marked', and will not remove the
* RECENTLY_DEAD tuple. This is not a correctness issue, and the
* RECENTLY_DEAD tuple will be removed by a later VACUUM.
My patch splits the tuples into HOT and non-HOT while gathering their
visibility information and first calls heap_prune_chain() on the
non-HOT tuples and then processes the yet unmarked HOT tuples in a
separate loop afterward. This will follow all of the chains and
process them completely as well as processing all HOT tuples which may
not be reachable from a valid chain. The fast path contains a special
check to ensure that line pointers for DEAD not HOT-updated HOT tuples
(dead orphaned tuples from aborted HOT updates) are still marked
LP_UNUSED even though they are not reachable from a valid HOT chain.
By doing this later, we don't preclude ourselves from following all
chains.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Kartyshov Ivan | 2024-03-30 16:14:14 | Re: [HACKERS] make async slave to wait for lsn to be replayed |
Previous Message | Dean Rasheed | 2024-03-30 15:31:47 | Re: Adding OLD/NEW support to RETURNING |