Re: Proposal: Progressive explain

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Progressive explain
Date: 2025-03-28 14:30:02
Message-ID: CA+TgmoYKovPZTiqhz2=X_Yp7QDm8FOwtkxvtTaO3NhGdgNo1Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 27, 2025 at 9:38 PM Rafael Thofehrn Castro
<rafaelthca(at)gmail(dot)com> wrote:
> > Calling ProgressiveExplainTrigger() directly from
> > ProgressiveExplainStartupTimeoutHandler() seems extremely scary --
>
> Agreed. In that other similar patch to log query plans after a signal is sent
> from another session there was the same discussion and concerns.
>
> I don't see another way of doing it here. This patch became 100x more complex
> after I added GUC progressive_explain_min_duration, that required changing the
> execProcNode wrapper on the fly.
>
> I can see some alternatives here:
>
> A) Use a temporary execProcNode wrapper set at query start here:
>
> The problem here is that all queries would always be using the custom
> wrapper until timeout is triggered, and that can add unnecessary overhead.
>
> B) We get rid of the idea of applying progressive explains to non instrumented
> runs (which was my original idea). My main use-case here is to see progress
> of instrumented runs anyways. For that idea we have 2 possibilities:
>
> B1) Keep implementation as is, with all the existing GUCs to control what
> is included in the printed plan. We just change GUC progressive_explain to be
> a boolean again. If true, instrumentation will be enabled for the query being
> executed and progressive explain will be printed consecutively.
>
> B2) Get rid of all new GUCs I added and change the progressive explain feature
> to become an option of the EXPLAIN command. Something like:
>
> EXPLAIN (ANALYZE, PROGRESSIVE) SELECT * FROM ...
>
> (B1) would allow progressive explans in regular queries, but I'm not sure people
> would be willing to enable it globally as it adds instrumentation overhead.

I'm inclined to think that there isn't much value to this feature with
progressive_explain=explain. If you just want to print plans after a
certain amount of elapsed runtime, you can already use auto_explain to
do that. Unless there's some pretty large benefit to doing it with
this feature over what that already does, I think we shouldn't invent
a second method. Granted, auto_explain is a contrib module and this is
proposed as a core feature, but I feel like that use case of printing
the plan once is so different from what I see as the core value
proposition of this feature that I think it might just be confusing to
include it in scope. There's nothing actually "progressive" about
that, because you're just doing it once.

But having said that, I'm not quite sure I understand why you're
proposing (A) and (B1) as separate alternatives. Changing
progressive_explain to be a Boolean doesn't seem like it solves the
problem of needing to wrap ExecProcNode from a signal handler. The
only thing that seems to solve that problem is to instead do the
wrapping at the start of the query, which AIUI is (A). So I feel like
you should do (A) to solve the problem I thought we were talking about
and (B1) to make things simpler. Am I misunderstanding the trade-offs
here?

I did consider proposing (B2) yesterday, not so much because of this
issue but because I wondered whether it might just be a better
interface. But on reflection I decided it wasn't, because it removes
the option to just turn this feature on for all of your queries, which
somebody might want to do. Also, it would actually be kind of a weird
interface, because every other form of EXPLAIN returns the plan as
output and throws away the original query output; but this form would
store the plan ephemerally in a view and return the original output.
I'm sure we could make that work and find a way to document it but it
seems odd.

In general, I think we should err on the side of making this feature
safe even at some performance cost. If we put in place a version of
this feature that imposes a great performance overhead but does not
crash the server, some people will be unhappy, complain, and/or be
unable to use the feature. That will be sad. But, if we put in place a
version of this feature that performs great and occasionally crashes
the server, that will be much sadder. So let's do something we're very
confident is safe. There is always the opportunity to change things in
the future if we're more confident about some questionable choice then
than we are now -- performance optimization can be done after the
fact. But if it's not stable, the only thing we're likely to change in
the future is to revert the whole thing.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-03-28 14:43:39 Re: getting "shell command argument contains a newline or carriage return:" error with pg_dumpall when db name have new line in double quote
Previous Message Tom Lane 2025-03-28 14:22:08 Re: Test to dump and restore objects left behind by regression