Quick Links

Re: Add index scan progress to pg_stat_progress_vacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Add index scan progress to pg_stat_progress_vacuum
Date:	2022-04-07 15:38:36
Message-ID:	CAD21AoBduTv=AQS_V0or50Fdbz7NjS2o4EWnMCaXTJ9yYJr7ew@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 7, 2022 at 10:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih(at)amazon(dot)com> wrote:
> > > At the beginning of a parallel operation, we allocate a chunk of>
> > > dynamic shared memory which persists even after some or all workers
> > > have exited. It's only torn down at the end of the parallel operation.
> > > That seems like the appropriate place to be storing any kind of data
> > > that needs to be propagated between parallel workers. The current
> > > patch uses the main shared memory segment, which seems unacceptable to
> > > me.
> >
> > Correct, DSM does track shared data. However only participating
> > processes in the parallel vacuum can attach and lookup this data.
> >
> > The purpose of the main shared memory is to allow a process that
> > Is querying the progress views to retrieve the information.
>
> Sure, but I think that you should likely be doing what Andres
> recommended before:
>
> # Why isn't the obvious thing to do here to provide a way to associate workers
> # with their leaders in shared memory, but to use the existing progress fields
> # to report progress? Then, when querying progress, the leader and workers
> # progress fields can be combined to show the overall progress?
>
> That is, I am imagining that you would want to use DSM to propagate
> data from workers back to the leader and then have the leader report
> the data using the existing progress-reporting facilities. Now, if we
> really need a whole row from each worker that doesn't work, but why do
> we need that?

I also proposed the same idea before[1]. The leader can know how many
indexes are processed so far by checking PVIndStats.status allocated
on DSM for each index. We can have the leader check it and update the
progress information before and after vacuuming one index. If we want
to update the progress information more timely, probably we can pass a
callback function to ambulkdelete and amvacuumcleanup so that the
leader can do that periodically, e.g., every 1000 blocks, while
vacuuming an index.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBW6SMJ96CNoMeu%2Bf_BR4jmatPcfVA016FdD2hkLDsaTA%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Re: Add index scan progress to pg_stat_progress_vacuum at 2022-04-07 13:20:04 from Robert Haas

Responses

Re: Add index scan progress to pg_stat_progress_vacuum at 2022-04-07 23:25:01 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jonathan S. Katz	2022-04-07 15:39:03	Re: How about a psql backslash command to show GUCs?
Previous Message	Robert Haas	2022-04-07 15:37:15	Re: why pg_walfile_name() cannot be executed during recovery?