Quick Links

Re: RFC: Logging plan of the running query

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, destrex271(at)gmail(dot)com
Subject:	Re: RFC: Logging plan of the running query
Date:	2025-04-01 18:52:30
Message-ID:	CA+TgmoatMzMedKRYAK5s3euxUohd6m00szL4Q6GGQj+NhF813A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 21, 2025 at 8:40 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
> Rebased it again.
>
> On 2025-03-10 14:10, torikoshia wrote:
> > BTW the patch adds about 400 lines to explain.c and it may be better
> > to split the file as well as 9173e8b6046, but I leave it as it is for
> > now.
>
> This part remains unchanged.

Looking at ExplainAssembleLogOutput() is making me realize that
auto_explain is in serious need of some cleanup. That's not really the
fault of this patch, but the hack whereby we overwrite the [] that
would have surrounded the JSON output with {} is not very nice. I also
think that the auto_explain GUCs need rethinking. In theory, new
EXPLAIN options should be mirrored into auto_explain, but if you
compare ExplainAssembleLogOutput() to ExplainOnePlan(), you can see
that they are diverging. The PLANNING, SUMMARY, and SERIALIZE options
that are known to regular EXPLAIN aren't known to auto_explain, and
any customizable options that use explain_per_plan_hook won't be able
to work with auto_explain, either. Changing this is non-trivial
because SERIALIZE, for example, can't work the same way for
auto_explain as it does for EXPLAIN, and a full solution might also
require user-visible changes like replacing
auto_explain.log_<whatever> with auto_explain.explain, so I don't
really know. Maybe we should just live with it the way it is for now,
but it doesn't look very nice.

Rafael and I were just discussing off-list to what extent the parallel
query problems with his patch also apply to yours. His design makes
the problem easier to hit, I think, because setting
progressive_explain = on makes every backend attempt to dump a query
plan, whereas you'd have to hit the worker process with a signal and
just the right time. But the fundamental problem appears to be the
same. Another interesting wrinkle is that he settled on showing the
outermost running query whereas you settled on the innermost. If you
were showing the outermost, perhaps you could just prohibit
dump-the-query-plan on a parallel worker, but you don't really want to
prohibit it categorically, because the worker could be running an
inner query that could be dumped. So maybe you want to avoid setting
ActiveQueryDesc at the toplevel and then set/clear it for inner
queries. However, that's a bit weird, too. If we wanted to undertake a
bigger redesign here, we could try pushing the entire query plan
(including all subqueries) down to the worker, and just tell it which
part it's actually supposed to execute. However, that would have some
overhead and would arguably open up some risk of future bugs, since
passing subqueries as NULL is, I think, intended partly as a guard
against accidentally executing the wrong subqueries.

I don't think it's a good idea to put the logic to update
ActiveQueryDesc into ExecutorRun. I think that function should really
just call the hook, or standard_ExecutorRun. I don't know for sure
whether this work should be moved down into ExecutorRun or up into the
caller, but I don't think it should stay where it is.

My comment about the naming of WrapMultiExecProcNodesWithExplain() on
the other thread also applies here: MultiExecProcNode() is an
unrelated function. Likewise, WrapExecProcNodeWithExplain() misses
walking the node's subplan list. Also, I don't think an
ereport(DEBUG1, ...) is appropriate here.

Do we really need es->signaled? Couldn't we test es->analyze instead?

Do we really need ExecProcNodeOriginal? Can we find some way to reuse
ExecProcNodeReal instead of making the structure bigger?

I do think we should try to move some of this new code out into a
separate source file, but I'm not yet sure what we should call it. We
might want to share infrastructure with something what Rafael's patch,
which he called progressive EXPLAIN but which is really closer to a
general query progress facility, albeit perhaps too expensive to have
enabled by default.

Does the documentation typically mention when a function is
superuser-only? If so, it should do so here, too, using similar
wording.

It seems unnecessary to remark that you can't log a query plan after a
subtransaction abort, because the query wouldn't be running any more.
It's also true that you can't log a query after a toplevel abort, even
if you're still waiting for the aborted transaction to roll back. But
it also seems like once the aborted subtransaction is popped off the
stack and we return to the parent transaction, we should be able to
log any query running at the outer level. If this is meant to imply
that something doesn't work in this scenario, perhaps there is
something about the design that needs fixing.

Does ActiveQueryDesc really need to be exposed to the whole system?
Could it be a file-level variable?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: RFC: Logging plan of the running query at 2025-03-21 12:40:00 from torikoshia

Responses

Re: RFC: Logging plan of the running query at 2025-04-01 19:05:02 from Sami Imseih
Re: RFC: Logging plan of the running query at 2025-04-03 05:32:16 from torikoshia

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ranier Vilela	2025-04-01 18:53:08	Re: Small memory fixes for pg_createsubcriber
Previous Message	Nathan Bossart	2025-04-01 18:44:14	Re: Statistics Import and Export