Re: [HACKERS] CLUSTER command progress monitor

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [HACKERS] CLUSTER command progress monitor
Date: 2019-09-05 01:03:16
Message-ID: 20190905010316.GB14853@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote:
> I think this is all going in the wrong direction. It's the
> responsibility of the people who are calling the pgstat_progress_*
> functions to only do so when it's appropriate. Having the
> pgstat_progress_* functions try to untangle whether or not they ought
> to ignore calls made to them is backwards.

Check.

> It seems to me that the problem here can be summarized in this way:
> there's a bunch of code reuse in the relevant paths here, and somebody
> wasn't careful enough when adding progress reporting for one of the
> new commands, and so now things are broken, because e.g. CLUSTER
> progress reporting gets ended by a pgstat_progress_end_command() call
> that was intended for some other utility command but is reached in the
> CLUSTER case anyway. The solution to that problem in my book is to
> figure out which commit broke it, and then the person who made that
> commit either needs to fix it or revert it.

I am not sure that it is right as well to say that the first committed
patch is right and that the follow-up ones are wrong. CLUSTER
progress was committed first (6f97457), followed a couple of days
after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c). So let's
have a look at them..

For CLUSTER, the progress starts and ends in cluster_rel(). CLUSTER
uses its code paths at the beginning, but then things get more
complicated, particularly with finish_heap_swap() which calls directly
reindex_table(). 6f97457 includes one progress update at point which
can be a problem per its shared nature in reindex_relation() with
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. This last part is wrong IMO,
why should cluster reporting take priority in this code path,
enforcing anything else?

For CREATE INDEX, the progress reporting starts and ends once in
DefineIndex(). However, we have updates of progress within each index
AM build routine, which could be taken by many code paths. Is it
actually fine to give priority to CREATE INDEX in those cases? Those
paths can as well be taken by REINDEX or CLUSTER (right?), so having a
counter for CREATE INDEX looks logically wrong to me. The part where
we wait for snapshots looks actually good from the perspective of
REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

For REINDEX, we have a problematic start progress call in
reindex_index() which is for example called by reindex_relation for
each relation's index for a non-concurrent case (also in
ReindexRelationConcurrently()). I think that these are incorrect
locations, and I would have placed them in ReindexIndex(),
ReindexTable() and ReindexMultipleTables() so as we avoid anything
low-level. This has added two calls to pgstat_progress_update_param()
in reindex_index(), which is shared between all. Why would it be fine
to give the priority to a CREATE INDEX marker here if CLUSTER can also
cross this way?

On top of those issues, I see some problems with the current state of
affairs, and I am repeating myself:
- It is possible that pgstat_progress_update_param() is called for a
given command for a code path taken by a completely different
command, and that we have a real risk of showing a status completely
buggy as the progress phases share the same numbers.
- We don't consider wisely end and start progress handling for
cascading calls, leading to a risk where we start command A, but
for shared code paths where we assume that only command B can run then
the processing abruptly ends for command A.
- Is it actually fine to report information about a command completely
different than the one provided by the client? It is for example
possible to call CLUSTER, but show up to the user progress report
about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index). This could
actually make sense if we are able to handle cascading progress
reports.

These are, at least it seems to me, fundamental problems we need to
ponder more about if we begin to include more progress reporting, and
we don't have that now, and that worries me.

> It's quite possible here that we need a bigger redesign to make adding
> progress reporting for new command easier and less prone to bugs, but
> I don't think it's at all clear what such a redesign should look like
> or even that we definitely need one, and September is not the right
> time to be redesigning features for the pending release.

Definitely.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Edmund Horner 2019-09-05 01:06:56 Re: Tid scan improvements
Previous Message Jeremy Schneider 2019-09-05 01:01:05 Re: ERROR: multixact X from before cutoff Y found to be still running