Re: Conflicting updates of command progress

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Sami Imseih <samimseih(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Conflicting updates of command progress
Date: 2025-04-24 09:33:45
Message-ID: 4746.1745487225@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sami Imseih <samimseih(at)gmail(dot)com> wrote:

> >> pgstat_progress_start_command should only be called once by the entry
> >> point for the
> >> command. In theory, we could end up in a situation where start_command
> >> is called multiple times during the same top-level command;
>
> > Not only in theory - it actually happens when CLUSTER is rebuilding indexes.
>
> In the case of CLUSTER, pgstat_progress_start_command is only called once,
> but pgstat_progress_update_param is called in the context of both CLUSTER
> and CREATE INDEX.

pgstat_progress_start_command() is called twice: First with
cmdtype=PROGRESS_COMMAND_CLUSTER, second with
PROGRESS_COMMAND_CREATE_INDEX. The first happens in cluster_rel() the second
in cluster_rel() -> rebuild_relation() -> finish_heap_swap() ->
reindex_relation() -> reindex_index().

It does not matter though, the current design only expects one command.

> > That's a possible approach. However, if the index build takes long time in the
> > CLUSTER case, the user will probably be interested in details about the index
> > build.
>
> I agree,
>
> >> Is there a repro that you can share that shows the weird values? It sounds like
> >> the repro is on top of [1]. Is that right?
>
> >> You can reproduce the similar problem by creating a trigger function that
> >> runs a progress-reporting command like COPY, and then COPY data into
> >> a table that uses that trigger.
>
> >> [2] https://commitfest.postgresql.org/patch/5282/
>
> In this case, pgstat_progress_start_command is actually called
> twice in the life of a single COPY command; the upper-level COPY
> command calls pgstat_progress_start_command and then the nested COPY
> command also does calls pgstat_progress_start_command.
>
> > I think that can be implemented by moving the progress related fields from
> > PgBackendStatus into a new structure and by teaching the backend to insert a
> > new instance of that structure into a shared hash table (dshash.c)
>
> I think this is a good idea in general to move the backend progress to
> shared memory.
> and with a new API that will deal with scenarios as described above.
> 1/ an (explicit) nested
> command was started by a top-level command, such as the COPY case above.
> 2/ a top-level command triggered some other progress code implicitly, such as
> CLUSTER triggering CREATE INDEX code.

Yes, I mean a new API. I imagine pgstat_progress_start_command() to initialize
the shared memory to track the "nested" command and to put the existing value
of MyBEEntry onto a stack. pgstat_progress_end_command() would then restore
the original value.

However, special care is needed for [2] because that's not necessarily
nesting: consider merge-joining two foreign tables, both using file_fdw. In
this case the pointers to the progress tracking shared memory would probably
have to be passed as function arguments.

(Note that the current progress tracking system also uses shared memory,
however in a less flexible way.)

> I also like the shared memory approach because we can then not have to use
> a message like the one introduced in f1889729dd3ab0 to support parallel index
> vacuum progress 46ebdfe164c61.

I didn't know about these patches. I'm not sure though if this needs to be
removed. Even if each worker updated the progress information separately
(would users appreciate that?), it should still send the progress information
to the leader before it (i.e. the worker) exits.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robin Haberkorn 2025-04-24 09:50:04 Re: [PATCH] contrib/xml2: xslt_process() should report XSLT-related error details
Previous Message Japin Li 2025-04-24 09:27:08 Disallow redundant indexes