Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2021-07-08 05:30:59
Message-ID: CAD21AoAATNw99uY_hEwA+OqVFYM5kgu5DOesh9CyBSxkav5mfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 7, 2021 at 11:25 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Wed, 7 Jul 2021 at 13:47, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Hi all,
> >
> > Index vacuuming is one of the most time-consuming processes in lazy
> > vacuuming. lazy_tid_reaped() is a large part among them. The attached
> > the flame graph shows a profile of a vacuum on a table that has one index
> > and 80 million live rows and 20 million dead rows, where
> > lazy_tid_reaped() accounts for about 47% of the total vacuum execution
> > time.
> >
> > [...]
> >
> > Overall, 'rtbm' has a much better lookup performance and good memory
> > usage especially if there are relatively many dead tuples. However, in
> > some cases, 'intset' and 'array' have a better memory usage.
>
> Those are some great results, with a good path to meaningful improvements.
>
> > Feedback is very welcome. Thank you for reading the email through to the end.
>
> The current available infrastructure for TIDs is quite ill-defined for
> TableAM authors [0], and other TableAMs might want to use more than
> just the 11 bits in use by max-BLCKSZ HeapAM MaxHeapTuplesPerPage to
> identify tuples. (MaxHeapTuplesPerPage is 1169 at the maximum 32k
> BLCKSZ, which requires 11 bits to fit).
>
> Could you also check what the (performance, memory) impact would be if
> these proposed structures were to support the maximum
> MaxHeapTuplesPerPage of 1169 or the full uint16-range of offset
> numbers that could be supported by our current TID struct?

I think tbm will be the most affected by the memory impact of the
larger maximum MaxHeapTuplesPerPage. For example, with 32kB blocks
(MaxHeapTuplesPerPage = 1169), even if there is only one dead tuple in
a block, it will always require at least 147 bytes per block.

Rtbm chooses the container type among array, bitmap, or run depending
on the number and distribution of dead tuples in a block, and only
bitmap containers can be searched with O(1). Run containers depend on
the distribution of dead tuples within a block. So let’s compare array
and bitmap containers.

With 8kB blocks (MaxHeapTuplesPerPage = 291), 36 bytes are needed for
a bitmap container at maximum. In other words, when compared to an
array container, bitmap will be chosen if there are more than 18 dead
tuples in a block. On the other hand, with 32kB blocks
(MaxHeapTuplesPerPage = 1169), 147 bytes are needed for a bitmap
container at maximum, so bitmap container will be chosen if there are
more than 74 dead tuples in a block. And, with full uint16-range
(MaxHeapTuplesPerPage = 65535), 8192 bytes are needed at maximum, so
bitmap container will be chosen if there are more than 4096 dead
tuples in a block. Therefore, in any case, if more than about 6% of
tuples in a block are garbage, a bitmap container will be chosen and
bring a faster lookup performance. (Of course, if a run container is
chosen, the container size gets smaller but the lookup performance is
O(logN).) But if the number of dead tuples in the table is small and
we have the larger MaxHeapTuplesPerPage, it’s likely to choose an
array container, and the lookup performance becomes O(logN). Still, it
should be faster than the array data structure because the range of
search targets in an array container is much smaller.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2021-07-08 05:35:58 Re: visibility map corruption
Previous Message Michael Paquier 2021-07-08 05:01:05 Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.