Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-12 08:49:04
Message-ID: CAD21AoCqqSSD=B4OWbsfHp_0tm+qWGvaa=tzUv5gxqreQOF5cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 11, 2024 at 9:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jan 8, 2024 at 8:35 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> >
> > On Wed, Jan 3, 2024 at 9:10 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > > > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > > > calls part. I think that even if we expose them, we will still need to
> > > > do something like "if (TidStoreIsShared(ts))
> > > > shared_rt_lock_share(ts->tree.shared)", no?
> > >
> > > I'll come back to this topic separately.
> >
> > To answer your question, sure, but that "if (TidStoreIsShared(ts))"
> > part would be pushed down into a function so that only one place has
> > to care about it.
> >
> > However, I'm starting to question whether we even need that. Meaning,
> > lock the tidstore separately. To "lock the tidstore" means to take a
> > lock, _separate_ from the radix tree's internal lock, to control
> > access to two fields in a separate "control object":
> >
> > +typedef struct TidStoreControl
> > +{
> > + /* the number of tids in the store */
> > + int64 num_tids;
> > +
> > + /* the maximum bytes a TidStore can use */
> > + size_t max_bytes;
> >
> > I'm pretty sure max_bytes does not need to be in shared memory, and
> > certainly not under a lock: Thinking of a hypothetical
> > parallel-prune-phase scenario, one way would be for a leader process
> > to pass out ranges of blocks to workers, and when the limit is
> > exceeded, stop passing out blocks and wait for all the workers to
> > finish.
>
> True. I agreed that it doesn't need to be under a lock anyway, as it's
> read-only.
>
> >
> > As for num_tids, vacuum previously put the similar count in
> >
> > @@ -176,7 +179,8 @@ struct ParallelVacuumState
> > PVIndStats *indstats;
> >
> > /* Shared dead items space among parallel vacuum workers */
> > - VacDeadItems *dead_items;
> > + TidStore *dead_items;
> >
> > VacDeadItems contained "num_items". What was the reason to have new
> > infrastructure for that count? And it doesn't seem like access to it
> > was controlled by a lock -- can you confirm? If we did get parallel
> > pruning, maybe the count would belong inside PVShared?
>
> I thought that since the tidstore is a general-purpose data structure
> the shared counter should be protected by a lock. One thing I'm
> concerned about is that we might need to update both the radix tree
> and the counter atomically in some cases. But that's true we don't
> need it for lazy vacuum at least for now. Even given the parallel scan
> phase, probably we won't need to have workers check the total number
> of stored tuples during a parallel scan.
>
> >
> > The number of tids is not that tightly bound to the tidstore's job. I
> > believe tidbitmap.c (a possible future client) doesn't care about the
> > global number of tids -- not only that, but AND/OR operations can
> > change the number in a non-obvious way, so it would not be convenient
> > to keep an accurate number anyway. But the lock would still be
> > mandatory with this patch.
>
> Very good point.
>
> >
> > If we can make vacuum work a bit closer to how it does now, it'd be a
> > big step up in readability, I think. Namely, getting rid of all the
> > locking logic inside tidstore.c and let the radix tree's locking do
> > the right thing. We'd need to make that work correctly when receiving
> > pointers to values upon lookup, and I already shared ideas for that.
> > But I want to see if there is any obstacle in the way of removing the
> > tidstore control object and it's separate lock.
>
> So I agree to remove both max_bytes and num_items from the control
> object.Also, as you mentioned, we can remove the tidstore control
> object itself. TidStoreGetHandle() returns a radix tree handle, and we
> can pass it to TidStoreAttach(). I'll try it.
>

I realized that if we remove the whole tidstore control object
including max_bytes, processes who attached the shared tidstore cannot
use TidStoreIsFull() actually as it always returns true. Also they
cannot use TidStoreReset() as well since it needs to pass max_bytes to
RT_CREATE(). It might not be a problem in terms of lazy vacuum, but it
could be problematic for general use. If we remove it, we probably
need a safeguard to prevent those who attached the tidstore from
calling these functions. Or we can keep the control object but remove
the lock and num_tids.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2024-01-12 09:27:25 plpgsql memory leaks
Previous Message Bertrand Drouvot 2024-01-12 07:15:44 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed