Re: Lock-free compaction. Why not?

From: Ahmed Yarub Hani Al Nuaimi <ahmedyarubhani(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Lock-free compaction. Why not?
Date: 2024-07-21 14:42:12
Message-ID: CAF239vpZ3zn0zZqBEiP-hBXKorxumCbFifKjc9E-A==a7b6TXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

That clearly explains the problem. But this got me thinking: what if we do
both index and heap optimization at the same time?
Meaning that the newly move heap tuple which is used to compact/defragment
heap pages would be followed by moving the index (creating and then
deleting) a new index tuple at the right place in the index data files (the
one that had its dead tuples removed and internally defragmented, aka
vacuumed). Deleting the old index could be done immediately after moving
the heap tuple. I think that this can both solve the bloating problem and
make sure that both the table and index heaps are in optimum shape, all of
this being done lazily to make sure that these operations would only be
done when the servers are not overwhelmed (or just using whatever logic our
lazy vacuuming uses). What do you think?

On Sat, Jul 20, 2024 at 10:52 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Sun, 21 Jul 2024 at 04:00, Ahmed Yarub Hani Al Nuaimi
> <ahmedyarubhani(at)gmail(dot)com> wrote:
> > 2- Can you point me to a resource explaining why this might lead to
> index bloating?
>
> No resource links, but if you move a tuple to another page then you
> must also adjust the index. If you have no exclusive lock on the
> table, then you must assume older transactions still need the old
> tuple version, so you need to create another index entry rather than
> re-pointing the existing index entry's ctid to the new tuple version.
> It's not hard to imagine that would cause the index to become larger
> if you had to move some decent portion of the tuples to other pages.
>
> FWIW, I think it would be good if we had some easier way to compact
> tables without blocking concurrent users. My primary interest in TID
> Range Scans was to allow easier identification of tuples near the end
> of the heap that could be manually UPDATEd after a vacuum to allow the
> heap to be shrunk during the next vacuum.
>
> David
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-07-21 15:13:11 Re: why there is not VACUUM FULL CONCURRENTLY?
Previous Message Thomas Munro 2024-07-21 12:20:58 Re: Trying to build x86 version on windows using meson