Quick Links

Re: Proposal: Progressive explain

From:	Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>
To:	Euler Taveira <euler(at)eulerto(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, torikoshia(at)oss(dot)nttdata(dot)com
Subject:	Re: Proposal: Progressive explain
Date:	2025-03-14 15:36:34
Message-ID:	CAG0ozMpwRQFNraWLGVa0zAFBYuY1HasZ6NX4BAM4H3OOrQHtDg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the valuable inputs Euler. Adjusted most of your recommendations.

> I found a crash. It is simple to reproduce.

Indeed, I failed to test plain EXPLAIN after the addition of the new GUC
progressive_explain_min_duration. This is fixed.

> You call this feature "progressive explain". My first impression is that
it
> will only provide the execution plans for EXPLAIN commands. Instead of
> "progressive explain", I would suggest "query progress" that is a general
> database terminology. It seems natural to use "progressive explain" since
you
> are using the explain infrastructure (including the same options --
format,
> settings, wal, ...) -- to print the execution plan.

Makes sense. Kept progressive explain for now but this is still open for
discussion.

> There is no use for the function argument. If you decide to keep this
function,
remove it.

Done.

> Why don't you use the pgstat_progress_XXX() API? Since you are using a
> pg_stat_progress_XXX view name I would expect using the command progress
> reporting infrastructure (see backend_progress.c).

I haven't changed that part as of now. My implementation and underlying data
structure may not work well with that API, but I am investigating.

> Maybe you could include datid and datname as the other progress reporting
> views. It would avoid a join to figure out what the database is.

Done.

> Isn't it the same definition as in auto_explain.c? Use only one
definition for
> auto_explain and this feature. You can put this struct into explain.c,
use an
> extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion
into
> guc.h. See wal_level_options, for an example.

Done.

> The "analyze" is a separate option in auto_explain. Should we have 2
options?
> One that enable/disable this feature and another one that enable/disable
> analyze option.

Tomas Vondra proposed the current logic and I think it makes sense. Having
a
single GUC to control the execution behavior keeps the feature simpler IMO.

> Don't the other EXPLAIN options make sense here? Like serialize and
summary.

I added a missing GUC for option COSTS (progressive_explain_costs). Adding
the other ones doesn't make much sense IMO. SUMMARY, SERIALIZE and MEMORY
are information added at the end of the query execution (or plan creation
for plain
EXPLAIN) in the summary section but at that point the progressive explain
will be
already finished, with no more information in pg_stat_progress_explain.

> TupleDesc tupDesc; /* descriptor for result tuples */
> EState *estate; /* executor's query-wide state */
> PlanState *planstate; /* tree of per-plan-node state */
> + struct ExplainState *pe_es; /* progressive explain state if enabled */

> Should you use the same name pattern here? pestate, for example.

Done.

> PG_LWLOCK(52, SerialControl)
> +PG_LWLOCK(53, ExplainHash)

> Could you use a specific name? Even if you keep the proposed name, you
should
> use ProgressiveExplainHash, ProgressiveExplain or QueryProgress.

Changed to ProgressiveExplainHash.

> You don't need progressiveExplainHashKey. Use pid as key directly.

Done.

> I don't think last_print is accurate because it is not the time the
execution plan
> is printed but the time it was updated. I suggest last_updated_time.

Changed from last_print to last_update. This is still open for discussion.

> I'm wondering why you use "array" in the name. ProgressiveExplainHash is a
> better name.

Used to be compatible with the ProcArray (that is also a hash). But what you
proposed is better indeed. Changed.

> I think you need a better name for PROGRESSIVE_EXPLAIN_ALLOC_SIZE because
it
> doesn't reflect what it is. PROGRESSIVE_EXPLAIN_FREE_SIZE or
> PROGRESSIVE_EXPLAIN_AVAIL_SIZE?

Changed to PROGRESSIVE_EXPLAIN_FREE_SIZE.

Fixed the wrong function names in the comments and changed the format of
those
comments in function headers to be comptible with other functions in
explain.c.

> + /* state related to progressive explains */
> + struct PlanState *pe_curr_node;
> + struct Instrumentation *pe_local_instr;
> + dsa_area *pe_a;

> Could you add some comments saying what each of these variables are for?

Done.

> I didn't experiment but I was wondering if there is a way to avoid the
> duplicates that you added to avoid the overhead.

You mean the local instrumentation object reused for each node?

Rafael.

Attachment	Content-Type	Size
v9-0001-Proposal-for-progressive-explains.patch	application/octet-stream	84.7 KB

In response to

Re: Proposal: Progressive explain at 2025-03-12 14:25:27 from Euler Taveira

Responses

Re: Proposal: Progressive explain at 2025-03-19 17:46:56 from Rafael Thofehrn Castro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Christoph Berg	2025-03-14 15:39:55	Re: Available disk space per tablespace
Previous Message	David G. Johnston	2025-03-14 15:26:32	Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.