Re: Generating code for query jumbling through gen_node_support.pl

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Subject: Re: Generating code for query jumbling through gen_node_support.pl
Date: 2023-02-08 06:47:51
Message-ID: Y+NFl6N1QWb4ftUy@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 07, 2023 at 05:32:07PM -0500, Tom Lane wrote:
> I have just noticed that this patch is generating useless jumbling
> code for node types such as Path nodes and other planner infrastructure
> nodes. That no doubt contributes to the miserable code coverage rating
> for queryjumblefuncs.*.c, which have enough dead lines to drag down the
> overall rating for all of backend/nodes/. Shouldn't a little more
> attention have been paid to excluding entire node classes if they can
> never appear in Query?

This one was intentional to let extensions play with jumbling of such
nodes, but perhaps you are right that it makes little sense at this
stage. If there is an ask for it later, though.. Using
shared_preload_libraries = pg_stat_statements and compute_query_id =
regress shows that numbers go up to 60% for funcs.c and 30% for
switch.c. Removing nodes like as of the attached brings these numbers
respectively up to 94.5% and 93.5% for a check. With a check-world, I
measure respectively 96.7% and 96.1% because there is more coverage
for extensions, ALTER SYSTEM and database commands, roughly.

This could also be a file-level policy by enforcing no_query_jumble in
gen_node_support.pl by looking at the header name, still I favor
no_query_jumble to keep all the pg_node_attr() in a single area with
the headers. Note that the attached includes in 0002 the tweak to
enforce the computation with compute_query_id if you want to test it
yourself and check my numbers. This is useful IMO as we could detect
missing nodes for all queries (utilities or not), still doing this
change may deserve a separate discussion. Note that I am not seeing
any "unrecognized node type" in any of the logs for any queries.

As a side note, should we be more aggressive with the tests related to
the jumbling code since it is now in core? For example, XmlExpr or
MinMaxExpr, which are part of Query nodes that can be jumbled even in
older branches, have zero coverage by default as
coverage.postgresql.org reports, because everything goes through
pg_stat_statements and it includes no queries with such nodes. My
buildfarm member batta makes sure to stress that no nodes are missing
by overloading pg_stat_statements and compute_query_id = regress in
the configuration, so no nodes are missing from the computation, still
the coverage could be better across the board. Expanding the tests of
pg_stat_statements is needed in this area for some time, still could
there be a point in switching compute_query_id = regress so as it is a
synonym of "on" without the EXPLAIN output rather than "auto"? If the
setting is enforced by pg_regress, the coverage of queryjumble.c would
be so much better, at least..
--
Michael

Attachment Content-Type Size
0001-Mark-more-nodes-with-node-attribute-no_query_jumble.patch text/x-diff 26.5 KB
0002-Switch-compute_query_id-regress-to-mean-on-and-force.patch text/x-diff 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-08 07:01:03 Re: Generating code for query jumbling through gen_node_support.pl
Previous Message Andres Freund 2023-02-08 06:38:14 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)