Re: Set query_id for query contained in utility statement

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Set query_id for query contained in utility statement
Date: 2024-10-22 05:06:16
Message-ID: ZxcyyGaTy_5FmORw@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 21, 2024 at 10:35:17AM +0200, Anthonin Bonnefoy wrote:
> I've updated 0001 to only use ORDER BY query. The query strings are
> not exact doublons, as the nested statement has the additional ending
> ';' due to using the whole string instead of just the RawStmt. Thus,
> the other sort expressions will never be used since there's no
> equality. There's also the possibility of breaking down each statement
> in individual blocks, with pgss reset and fetch for each one. However,
> I feel it's gonna add a lot of noise in the test file.

I've looked at 0001, and finished by splitting the case of all-level
tracking with the multi-statements as the resulting table was feeling
heavily bloated, particularly because of MERGE that spawned in
multiple lines even if there were less entries. The rest, except for
some styling inconsistencies, was feeling OK.

One of the multi-statement tests includes this output for HEAD, and
that's on two PGSS entries:
EXPLAIN (COSTS OFF) SELECT $1, $2 UNION SELECT $3, $4;
EXPLAIN (COSTS OFF) (SELECT 1, 2, 3) UNION SELECT 3, 4, 5;

EXPLAIN (COSTS OFF) SELECT 1, 2 UNION SELECT 3, 4;
EXPLAIN (COSTS OFF) (SELECT $1, $2, $3) UNION SELECT $4, $5, $6;

I did not notice that first, but that's really something!
Normalization is only applied partially to a portion of the string, so
we have a bunch of bloat for non-top queries that has existed for
years.

+ ParseLoc stmt_location; /* start location, or -1 if unknown */
+ ParseLoc stmt_len; /* length in bytes; 0 means "rest of string" */

I'm OK with this approach after considering a few things, mostly in
terms of consistency with the existing node structures. The existing
business with YYLLOC_DEFAULT() counts here.

-Query *
-transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)

What's the advantage of removing this routine? Is that because you're
pushing the initialization of stmt_location and stmt_len into
transformOptionalSelectInto(), making it mostly moot? Something that
worries me a bit is that this changes makes the code less clean, by
having a SELECT-INTO specific routine called in the parse-analyze
paths, while adding three individual paths in charge of setting
pstate->p_stmt_len and p_stmt_location.

+ n->stmt_len = @3 - @2;

This specific case deserves a comment. I don't have the best
understanding of this area yet (need more analysis), but With the
existing updateRawStmtEnd() and RawStmt also tracking this
information, I am wondering if we could be smarter with less paths
manipulating the start locations and lengths. And your patch adds a
new setQueryStmtLen() that does the same kind of job. Hmm.

FWIW, I don't feel strongly about 0004 that tries to make the REFRESH
handling smarter. I am not sure that it even makes sense as-is by
hacking into a wrapper of pg_get_viewdef_worker() to get the query
string, leading it to not be normalized. This has also a small
footprint in 0003. I think that the bits in ExecRefreshMatView()
should be discarded from 0003, as a result.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-22 05:09:03 Re: Set query_id for query contained in utility statement
Previous Message Tatsuo Ishii 2024-10-22 04:53:43 Re: Row pattern recognition