Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-09 08:53:01
Message-ID: CAFBsxsFdkG9bgXViC-EfCb-BySyA=kDq5xu8vc1HeE8QKHud+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 9, 2022 at 8:20 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> In the meanwhile, I've been working on vacuum integration. There are
> two things I'd like to discuss some time:
>
> The first is the minimum of maintenance_work_mem, 1 MB. Since the
> initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel
> vacuum with radix tree cannot work with the minimum
> maintenance_work_mem. It will need to increase it to 4MB or so. Maybe
> we can start a new thread for that.

I don't think that'd be very controversial, but I'm also not sure why we'd
need 4MB -- can you explain in more detail what exactly we'd need so that
the feature would work? (The minimum doesn't have to work *well* IIUC, just
do some useful work and not fail).

> The second is how to limit the size of the radix tree to
> maintenance_work_mem. I think that it's tricky to estimate the maximum
> number of keys in the radix tree that fit in maintenance_work_mem. The
> radix tree size varies depending on the key distribution. The next
> idea I considered was how to limit the size when inserting a key. In
> order to strictly limit the radix tree size, probably we have to
> change the rt_set so that it breaks off and returns false if the radix
> tree size is about to exceed the memory limit when we allocate a new
> node or grow a node kind/class.

That seems complex, fragile, and wrong scope.

> Ideally, I'd like to control the size
> outside of radix tree (e.g. TIDStore) since it could introduce
> overhead to rt_set() but probably we need to add such logic in radix
> tree.

Does the TIDStore have the ability to ask the DSA (or slab context) to see
how big it is? If a new segment has been allocated that brings us to the
limit, we can stop when we discover that fact. In the local case with slab
blocks, it won't be on nice neat boundaries, but we could check if we're
within the largest block size (~64kB) of overflow.

Remember when we discussed how we might approach parallel pruning? I
envisioned a local array of a few dozen kilobytes to reduce contention on
the tidstore. We could use such an array even for a single worker (always
doing the same thing is simpler anyway). When the array fills up enough so
that the next heap page *could* overflow it: Stop, insert into the store,
and check the store's memory usage before continuing.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-12-09 09:03:39 Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Sergey Shinderuk 2022-12-09 08:27:56 Re: Add PL/pgSQL extra check no_data_found