Re: parallel vacuum comments

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: parallel vacuum comments
Date: 2021-12-13 04:54:43
Message-ID: CAD21AoAR34jZFix3Mi-M4tnXU8k-TcooEoVtvbVD6y3Om+kmSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 13, 2021 at 12:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Dec 11, 2021 at 8:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sat, Dec 11, 2021 at 2:32 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > Hi,
> > >
> > > On 2021-10-30 14:21:01 -0700, Andres Freund wrote:
> > > > Due to bug #17245: [1] I spent a considerably amount of time looking at vacuum
> > > > related code. And I found a few things that I think could stand improvement:
> >
> > Thank you for the comments.
> >
> > >
> > > While working on the fix for #17255 (more specifically some cleanup that Peter
> > > suggested in the context), I noticed another thing: Initializing parallelism
> > > as part of dead_items_alloc() is a bad idea. Even if there are comments noting
> > > that oddity.
> > >
> > > I don't really see why we should do it this way? There's no "no-parallelism"
> > > path in begin_parallel_vacuum() besides compute_parallel_vacuum_workers(). So
> > > it's not like we might just discover the inability to do parallelism during
> > > parallel initialization?
> >
> > Right. Also, in parallel vacuum case, it allocates the space not only
> > for dead items but also other data required to do parallelism like
> > shared bulkdeletion results etc. Originally, in PG13,
> > begin_parallel_vacuum() was called by lazy_scan_heap() but in PG14 it
> > became part of dead_items_alloc() (see b4af70cb2). I agree to change
> > this part so that lazy_scan_heap() calls begin_parallel_vacuum()
> > (whatever we rename it). I'll incorporate this change in the
> > refactoring patch barring any objections.
> >
> > >
> > > It's also not particularly helpful to have a begin_parallel_vacuum() that
> > > might not actually begin a parallel vacuum...
> >
> > During the development, I found that we have some begin_* functions
> > that don't start the actual parallel job but prepare state data for
> > starting parallel job and referred to _bt_begin_parallel() so I named
> > begin_parallel_vacuum(). But I admit that considering what the
> > function actually does, something like
> > create_parallel_vacuum_context() would be clearer.
> >
>
> How about if we name it as parallel_vacuum_init() which will be
> similar InitializeParallelDSM, ExecInitParallelPlan().

parallel_vacuum_init() sounds better.

> Now, I see
> there is some reasoning to keep it in dead_items_alloc as both
> primarily allocate memory for vacuum but maybe we should name the
> function vacuum_space_alloc instead of dead_items_alloc and similarly
> rename dead_items_cleanup to vacuum_space_free. The other idea could
> be to bring begin_parallel_vacuum() back in lazy_scan_heap() but I
> personally prefer the idea to keep it where it is but improve function
> names. Will it be better to do this as a separate patch as 0002
> because this might require some change in the vacuum code path?

Yeah, if we do just renaming functions, I think we can do that in 0001
patch. On the other hand, if we need to change the logic, it's better
to do that in a separate patch.

>
> > >
> > > Minor nit:
> > >
> > > begin_parallel_vacuum()'s comment says:
> > > * On success (when we can launch one or more workers), will set dead_items and
> > > * lps in vacrel for caller.
> > >
> > > But it actually doesn't know whether we can start workers. It just checks
> > > max_parallel_maintenance_workers, no?
> >
> > Yes, we cannot know whether we can actually start workers when
> > starting parallel index vacuuming. It returns non-NULL if we request
> > one or more workers.
> >
>
> So can we adjust the comments? I think the part of the sentence "when
> we can launch one or more workers" seems to be the cause of confusion,
> can we remove it?

Yes, we can remove it. Or replace "can launch" with "request".

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-12-13 05:02:55 Re: parallel vacuum comments
Previous Message Thomas Munro 2021-12-13 04:51:00 Re: Add client connection check during the execution of the query