Re: parallel vacuum comments

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: parallel vacuum comments
Date: 2021-12-07 01:24:14
Message-ID: CAD21AoAscK32GSgowR0DwFg7YvA4t7nFw_gez5LzKmc0HLp6Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 6, 2021 at 1:47 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Dec 3, 2021 at 6:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 3, 2021 at 6:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Dec 2, 2021 at 6:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I've attached updated patches.
> > > >
> > >
> > > I have a few comments on v4-0001.
> >
> > Thank you for the comments!
> >
> > > 1.
> > > In parallel_vacuum_process_all_indexes(), we can combine the two
> > > checks for vacuum/cleanup at the beginning of the function
> >
> > Agreed.
> >
> > > and I think
> > > it is better to keep the variable name as bulkdel or cleanup instead
> > > of vacuum as that is more specific and clear.
> >
> > I was thinking to use the terms "bulkdel" and "cleanup" instead of
> > "vacuum" and "cleanup" for the same reason. That way, probably we can
> > use “bulkdel" and “cleanup" when doing index bulk-deletion (i.g.,
> > calling to ambulkdelete) and index cleanup (calling to
> > amvacuumcleanup), respectively, and use "vacuum" when processing an
> > index, i.g., doing either bulk-delete or cleanup, instead of using
> > just "processing" . But we already use “vacuum” and “cleanup” in many
> > places, e.g., lazy_vacuum_index() and lazy_cleanup_index(). If we want
> > to use “bulkdel” instead of “vacuum”, I think it would be better to
> > change the terminology in vacuumlazy.c thoroughly, probably in a
> > separate patch.
> >
>
> Okay.
>
> > > 2. The patch seems to be calling parallel_vacuum_should_skip_index
> > > thrice even before starting parallel vacuum. It has a call to find the
> > > number of blocks which has to be performed for each index. I
> > > understand it might not be too costly to call this but it seems better
> > > to remember this info like we are doing in the current code.
> >
> > Yes, we can bring will_vacuum_parallel array back to the code. That
> > way, we can remove the call to parallel_vacuum_should_skip_index() in
> > parallel_vacuum_begin().
> >
> > > We can
> > > probably set parallel_workers_can_process in parallel_vacuum_begin and
> > > then again update in parallel_vacuum_process_all_indexes. Won't doing
> > > something like that be better?
> >
> > parallel_workers_can_process can vary depending on bulk-deletion, the
> > first time cleanup, or the second time (or more) cleanup. What can we
> > set parallel_workers_can_process based on in parallel_vacuum_begin()?
> >
>
> I was thinking to set the results of will_vacuum_parallel in
> parallel_vacuum_begin().
>
> > >
> > > 3. /*
> > > * Copy the index bulk-deletion result returned from ambulkdelete and
> > > @@ -2940,19 +2935,20 @@ parallel_process_one_index(Relation indrel,
> > > * Since all vacuum workers write the bulk-deletion result at different
> > > * slots we can write them without locking.
> > > */
> > > - if (shared_istat && !shared_istat->updated && istat_res != NULL)
> > > + if (!pindstats->istat_updated && istat_res != NULL)
> > > {
> > > - memcpy(&shared_istat->istat, istat_res, sizeof(IndexBulkDeleteResult));
> > > - shared_istat->updated = true;
> > > + memcpy(&(pindstats->istat), istat_res, sizeof(IndexBulkDeleteResult));
> > > + pindstats->istat_updated = true;
> > >
> > > /* Free the locally-allocated bulk-deletion result */
> > > pfree(istat_res);
> > > -
> > > - /* return the pointer to the result from shared memory */
> > > - return &shared_istat->istat;
> > > }
> > >
> > > I think here now we copy the results both for local and parallel
> > > cleanup. Isn't it better to write something about that in comments as
> > > it is not clear from current comments?
> >
> > What do you mean by "local cleanup"?
> >
>
> Clean-up performed by leader backend.

I might be missing your points but I think the patch doesn't change
the behavior around these codes. With the patch, we allocate
IndexBulkDeleteResult on DSM for every index but the patch doesn't
change the fact that in parallel vacuum/cleanup cases, we copy
IndexBulkDeleteResult returned from ambulkdelete() or amvacuumcleanup
to DSM space. Non-parallel vacuum doesn't use this function. Do you
have any suggestions on better comments here?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-12-07 02:11:01 Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?
Previous Message Peter Geoghegan 2021-12-07 01:13:53 Re: Why doesn't pgstat_report_analyze() focus on not-all-visible-page dead tuple counts, specifically?