Re: Add index scan progress to pg_stat_progress_vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add index scan progress to pg_stat_progress_vacuum
Date: 2022-04-07 15:38:36
Message-ID: CAD21AoBduTv=AQS_V0or50Fdbz7NjS2o4EWnMCaXTJ9yYJr7ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 10:20 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 6, 2022 at 5:22 PM Imseih (AWS), Sami <simseih(at)amazon(dot)com> wrote:
> > > At the beginning of a parallel operation, we allocate a chunk of>
> > > dynamic shared memory which persists even after some or all workers
> > > have exited. It's only torn down at the end of the parallel operation.
> > > That seems like the appropriate place to be storing any kind of data
> > > that needs to be propagated between parallel workers. The current
> > > patch uses the main shared memory segment, which seems unacceptable to
> > > me.
> >
> > Correct, DSM does track shared data. However only participating
> > processes in the parallel vacuum can attach and lookup this data.
> >
> > The purpose of the main shared memory is to allow a process that
> > Is querying the progress views to retrieve the information.
>
> Sure, but I think that you should likely be doing what Andres
> recommended before:
>
> # Why isn't the obvious thing to do here to provide a way to associate workers
> # with their leaders in shared memory, but to use the existing progress fields
> # to report progress? Then, when querying progress, the leader and workers
> # progress fields can be combined to show the overall progress?
>
> That is, I am imagining that you would want to use DSM to propagate
> data from workers back to the leader and then have the leader report
> the data using the existing progress-reporting facilities. Now, if we
> really need a whole row from each worker that doesn't work, but why do
> we need that?

+1

I also proposed the same idea before[1]. The leader can know how many
indexes are processed so far by checking PVIndStats.status allocated
on DSM for each index. We can have the leader check it and update the
progress information before and after vacuuming one index. If we want
to update the progress information more timely, probably we can pass a
callback function to ambulkdelete and amvacuumcleanup so that the
leader can do that periodically, e.g., every 1000 blocks, while
vacuuming an index.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoBW6SMJ96CNoMeu%2Bf_BR4jmatPcfVA016FdD2hkLDsaTA%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2022-04-07 15:39:03 Re: How about a psql backslash command to show GUCs?
Previous Message Robert Haas 2022-04-07 15:37:15 Re: why pg_walfile_name() cannot be executed during recovery?