Quick Links

Re: Proposal: Progressive explain

From:	Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: Progressive explain
Date:	2024-12-31 16:23:03
Message-ID:	CAG0ozMrtK_u8Uf5KNZUmRNuMphV5tnC5DEhRBNRGK+K4L506xw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks Greg, Sami and Jian for the feedback so far.

> Maybe track_explain instead? In the spirit of track_activity.

That was the original name, and all other GUCs were following the
track_activity_* logic. Changed to the name of the feature after
discussion with colleagues at EDB. This is definitely open for
discussion.

> 4096 seems low, if this means the explain plan is truncated at
> that size. Also, the 100 minimum seems arbitrary.

Min (100) and max (1048576) are the same as the values for GUC
track_activity_query_size, which has a very similar purpose: controls
the size of pg_stat_activity.query column.

> So we can enable verbose and settings - but not wal? I could see
> that one being useful. Not so much the rest (timing, summary). And
> buffers has recently changed, so no need to worry about that. :)

The logic I used for adding GUCs that control explain options is that
none of these settings should change QueryDesc->instrument_options,
which would change instrumentation options added to the actual
execution. GUCs available modify only the ExplainState object, which
affects only the output printed to pg_stat_progress_explain.

> Hmmm...don't have a solution/suggestion offhand, but using max_connections
> would seem to be allocating a chunk of memory that is never
> used 99% of the time, as most people don't run active queries
> near max_connections.
> (Actually, on re-reading my draft, I would prefer a rotating pool
> like pg_stat_statements does.)

Agreed. Thought about using a similar logic as pg_stat_statements
but the pool size there is very large by default, 5000. The difference
is that pg_stat_statements keeps the data in disk and I wanted to
avoid that as instrumented plans can print new plans very often,
affecting performance.

Maybe one idea would be to include a new GUC (progressive_explain_max_size)
that controls how many rows explainArray can have. If limit is reached
a backend won't print anything in that iteration.

> It's not clear if total_explain_time is now() - query_start or something
> else. If not, I would love to see an elapsed time interval column.

total_explain_time is accumulated time computed only printing
the plan. It does not include execution time.

> Perhaps add a leader_pid column. That's something I would always
> be joining with pg_stat_activity to find out.

For prints done by parallel workers? That information is available
in pg_stat_activity. The idea is to use the pid column and join with
pg_stat_activity to get all other relevant details.

> I do not think however this instrumentation should only be
> made available if a user runs EXPLAIN ANALYZE.
> In my opinion, this will severely limit the usefulness of this
> instrumentation in production. Of course, one can use auto_explain,
> but users will be hesitant to enable auto_explain with analyze in
> production for all their workloads. Also, there should not be an
> auto_explain dependency for this feature.

> One approach will be for the view to expose the
> explain plan and the current node being executed. I think the
> plan_node_id can be exposed for this purpose but have not looked
> into this in much detail yet. The plan_node_id can then be used
> to locate the part of the plan that is a potential bottleneck ( if that
> plan node is the one constantly being called ).

You mean that we could include the current node being executed even
for non instrumented runs? In that case it would print the plain
plan + current node? That is a valid point and shouldn't be
difficult to implement. The problem is that this would require
adding overhead to ExecProcNode() (non instrumented) and that
can be a performance killer.

> This may also be work that is better suited for an extension, but
> core will need to add a hook in ExecProcNode so an extension can
> have access to PlanState.

Are you talking about implementing your proposal (also printing
plan with current node for non instrumented runs) as an extension
or this whole patch as an extension?

If the whole patch, I thought about that. The thing is that the
proposal also changes ExplainNode() function, the core function
to print a plan. To implement it as an extension we would have to
duplicate 95% of that code.

I do think there is merit in having this feature as part of the
core and use existing extensions (auto_explain for example) to
increment it, like adding your suggestion to use a hook in
ExecProcNode().

> compile failed. the above is the error message.

Thanks. It was indeed missing an include. It complained only for
a complete build (including contrib), so I failed to catch it.

Sending a second version with the fix.

Rafael.

On Tue, Dec 31, 2024 at 3:00 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> hi.
>
> [48/208] Compiling C object
> contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
> FAILED: contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o
> /usr/local/gcc-14.1.0/bin/gcc-14.1.0
> -Icontrib/postgres_fdw/postgres_fdw.so.p -Isrc/include
> -I../../Desktop/pg_src/src5/postgres/src/include
> -Isrc/interfaces/libpq
> -I../../Desktop/pg_src/src5/postgres/src/interfaces/libpq
> -I/usr/include/libxml2 -fdiagnostics-color=always --coverage
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g
> -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
> -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3
> -Wcast-function-type -Wshadow=compatible-local -Wformat-security
> -Wdeclaration-after-statement -Wmissing-variable-declarations
> -Wno-format-truncation -Wno-stringop-truncation -Wunused-variable
> -Wuninitialized -Werror=maybe-uninitialized -Wreturn-type
> -DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
> -DREALLOCATE_BITMAPSETS -DLOCK_DEBUG -DRELCACHE_FORCE_RELEASE
> -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
> -DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer -fPIC -pthread
> -fvisibility=hidden -MD -MQ
> contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -MF
> contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o.d -o
> contrib/postgres_fdw/postgres_fdw.so.p/postgres_fdw.c.o -c
> ../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c
> In file included from
> ../../Desktop/pg_src/src5/postgres/contrib/postgres_fdw/postgres_fdw.c:22:
> ../../Desktop/pg_src/src5/postgres/src/include/commands/explain.h:86:9:
> error: unknown type name ‘TimestampTz’
> 86 | TimestampTz last_explain;
> | ^~~~~~~~~~~
> [58/188] Linking target contrib/sslinfo/sslinfo.so
> ninja: build stopped: subcommand failed.
>
> compile failed. the above is the error message.
>

Attachment	Content-Type	Size
v2-0001-Proposal-for-progressive-explains.patch	application/octet-stream	57.9 KB

In response to

Re: Proposal: Progressive explain at 2024-12-31 06:00:18 from jian he

Responses

Re: Proposal: Progressive explain at 2025-01-02 08:52:58 from jian he

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2024-12-31 16:36:53	Re: Backport of CVE-2024-10978 fix to older pgsql versions (11, 9.6, and 9.4)
Previous Message	Greg Sabino Mullane	2024-12-31 16:18:27	Re: add vacuum starttime columns