Re: PATCH: Add REINDEX tag to event triggers

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Garrett Thornburg <film42(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: PATCH: Add REINDEX tag to event triggers
Date: 2023-11-20 07:34:11
Message-ID: ZVsL84PYe-LoBGQx@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 28, 2023 at 12:15:22PM +0800, jian he wrote:
> reindex_event_trigger_collect_relation called in
> ReindexMultipleInternal, ReindexTable (line 2979).
> Both are "under concurrent is false" branches.
>
> So it should be fine.

Sorry for the delay in replying.

Indeed, you're right. reindex_event_trigger_collect_relation() gets
only called in two paths for the non-concurrent cases just after
reindex_relation(), which is not a safe location to call it because
this may be used in CLUSTER or TRUNCATE, and the intention of the
patch is only to count for the objects in REINDEX.

Anyway, I think that the current implementation is incorrect. The
object is collected after the reindex is done, which is right.
However, an object may be reindexed but not be reported if it was
dropped between the reindex and its endwhen collecting all the objects
of a single relation. Perhaps it does not matter because the object
is gone, but that still looks incorrect to me because we finish by not
reporting everything we should. I think that we should put the
collection deeper into the stack, where we know that we are holding
the locks on the objects we are collecting. Another side effect is
that the patch seems to be missing toast table indexes, which are also
reindexed for a REINDEX TABLE, for instance. That's not right.

Actually, could there be an argument for pushing the collection down
to reindex_relation() instead to count for all the commands that?
Even better, reindex_index() would be a better candidate because it is
itself called within reindex_relation() for each individual indexes?
If a code path should not be covered for event triggers, we could use
a new REINDEXOPT_ to control that, optionally. In short, it looks
like we should try to reduce the number of paths calling
reindex_event_trigger_collect(), while collect_relation() ought to be
removed.

Should REINDEX TABLE CONCURRENTLY's coverage be extended so as two new
indexes are reported instead of just one?

REINDEX SCHEMA is a command that perhaps should be tested to collect
all the indexes rebuilt through it?

A side-effect of this patch is that it impacts ddl_command_start and
ddl_command_end with the addition of REINDEX. Mixing that with the
addition of a new event is strange. I think that the addition of
REINDEX to the existing events should be done first, as a separate
patch. Then a second patch should deal with the collection and the
support of the new event.

I have also dug into the archives to note that control commands have
been proposed in the past to be added as part of event triggers, and
it happens that I've mentioned in a comment that that this was a
concept perhaps contrary to what event triggers should do, as these
are intended for DDLs:
https://www.postgresql.org/message-id/CAB7nPqTqZ2-YcNzOQ5KVBUJYHQ4kDSd4Q55Mc-fBzM8GH0bV2Q@mail.gmail.com

Philosophically, I'm open on all that and having more commands
depending on the cases that are satisfied, FWIW, but let's organize
the changes.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-11-20 07:42:00 Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
Previous Message Peter Eisentraut 2023-11-20 07:27:48 Re: meson documentation build open issues