Re: Proposal: Progressive explain

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Progressive explain
Date: 2025-03-27 18:26:40
Message-ID: CA+TgmobrzeDep+Z1BPQqGNsCqTQ8M58wNNKJB_8Lwpwbqbz3GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 26, 2025 at 5:58 PM Rafael Thofehrn Castro
<rafaelthca(at)gmail(dot)com> wrote:
> So implementation was done based on transaction nested level. Cleanup is only
> performed when AbortSubTransaction() is called in the same transaction nested
> level as the one where the query is running. This covers both PL/pgSQL exception
> blocks and savepoints.

Right. I think this is much closer to being correct. However, I
actually think it should look more like this:

void
AtEOSubXact_ProgressiveExplain(bool isCommit, int nestDepth)
{
if (activeQueryDesc != NULL &&
activeQueryXactNestLevel >= nestDepth)
{
if (isCommit)
elog(WARNING, "leaked progressive explain query descriptor");
ProgressiveExplainCleanup(NULL);
}
}

By including the is-commit case in there, we can catch any bugs where
things aren't cleaned up properly before a transaction is committed.
We generally want to test >= nestDepth instead of == nestDepth in case
multiple subtransaction levels abort all at once; I'm not sure it
matters here, but even if it isn't, it's best to be consistent with
the practice elsewhere. Having {Commit,Abort}SubTransaction pass the
nestDepth instead of calling GetCurrentTransactionNestLevel() also has
precedent e.g. see AtEOSubXact_HashTables.

As a further refinement, consider initializing
activeQueryXactNestLevel to -1 and resetting it to that value each
time you end a progressive EXPLAIN, so that activeQueryDesc != NULL if
and only if activeQueryXactNestLevel >= 0. Then the test above can be
simplified to just if (activeQueryXactNestLevel >= nestDepth).

The comment for ProgressiveExplainIsActive() is a copy-paste from
another function that you forgot to update. Also, the function body
could be reduced to a one-liner: return queryDesc == activeQueryDesc;

There is a comment in ExplainOnePlan() that says "Handle progressive
explain cleanup manually if enabled as it is not called as part of
ExecutorFinish," but standard_ExecutorFinish does indeed call
ProgressiveExplainFinish(), so either the comment is misleading or the
code is wrong.

standard_ExecutorFinish() makes its call to ProgressiveExplainFinish()
dependent on the value of the progressive_explain GUC, but that GUC
could be changed in mid-query via set_config() or a plpgsql function
that calls SET, which could result in skipping the cleanup even when
it's needed. I think you should make the call unconditional and make
it entirely the job of ProgressiveExplainFinish() to decide whether
any cleanup is needed.

ProgressiveExplainFinish() calls ProgressiveExplainCleanup() in most
cases, but sometimes just disables the timeout instead. I think this
is weird. I think you should just always call
ProgressiveExplainCleanup() and make sure it's robust and cleans up
however much or little is appropriate in all cases.

On the flip side, I can't really see why
dsa_detach(queryDesc->pestate->pe_a) needs to be done while holding
ProgressiveExplainHashLock. Why not just have
ProgressiveExplainFinish() call ProgressiveExplainCleanup(), and then
afterwards it can do the dsa_detach() in the caller? Then
ProgressiveExplainCleanup() no longer needs an argument.

ProgressiveExplainPrint() can save a level of indentation in a large
chunk of the function by understanding that elog(ERROR) does not
return. You don't need to wrap everything that follows in else {}.

In the documentation table called pg-stat-progress-explain-view, some
descriptions end with a period and others do not.

Calling ProgressiveExplainTrigger() directly from
ProgressiveExplainStartupTimeoutHandler() seems extremely scary --
we've tried hard to make our signal handlers do only very simple
things like setting a flag, and this one runs around the entire
PlanState tree modifying Very Important Things. I fear that's going to
be hard to make robust. Like, what happens if we're going around
trying to change ExecProcNode pointers while the calling code was also
going around trying to change ExecProcNode pointers? I can't quite see
how this won't result in chaos.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-03-27 18:32:27 Re: Remove restrictions in recursive query
Previous Message Renan Alves Fonseca 2025-03-27 18:21:27 Re: Remove restrictions in recursive query