From: | Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Cc: | Sami Imseih <samimseih(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, yasuo(dot)honda(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, smithpb2250(at)gmail(dot)com, vignesh21(at)gmail(dot)com, michael(at)paquier(dot)xyz, nathandbossart(at)gmail(dot)com, stark(dot)cfm(at)gmail(dot)com, geidav(dot)pg(at)gmail(dot)com, marcos(at)f10(dot)com(dot)br, robertmhaas(at)gmail(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)postgresql(dot)org, pavel(dot)trukhanov(at)gmail(dot)com, Sutou Kouhei <kou(at)clear-code(dot)com> |
Subject: | Re: pg_stat_statements and "IN" conditions |
Date: | 2025-02-12 18:39:39 |
Message-ID: | 202502121839.vjr3jm6yb35h@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-Feb-12, Dmitry Dolgov wrote:
> I've been experimenting with this today, and while it's easy to
> implement,
Great.
> there is one annoying thing for which I don't have a solution
> yet. When generating a normalized version for such merged queries in
> pgss we rely on expression location, something like:
>
> select i from t where i in (a1, a2, a3, ..., aN);
> | |
> expr loc1 expr loc2
>
> We remember loc1 and loc2, then do not copy anything between then into
> the normalized query. Now, the expression location is only known up to
> the parsing token, without taking into account e.g. parenthesis in more
> complex expressions. Which means we don't know exactly where an
> expression starts or ends, and it's hard to correctly represent queries
> like:
>
> select i from t where i in (((a1)), ((a2)), ((a3)), ..., ((aN)));
> | |
> expr loc1 expr loc2
>
> The normalized version looks like this:
>
> select i from t where i in (((...)));
>
> While it does not affect the actual functionality and is purely
> cosmetic, it's quite visible and causes questions.
The nastiness level of this seems quite low, compared to what happens to
this other example if we didn't handle these easy cases:
create table t (a float);
select i from t where i in (1, 2);
select i from t where i in (1, '2');
select i from t where i in ('1', 2);
select i from t where i in ('1', '2');
select i from t where i in (1.0, 1.0);
(The point here is that the datatype differs for the constants from the
lexer down in each of these cases.)
I think it's more important to handle this better than what the posted
patch does, than improving the lexing in presence of other lexical
elements in the query. With the current patch I get _five_
pg_stat_statements entries from these queries above, where only one of
them was able to apply merging of the elements:
queryid │ query
──────────────────────┼─────────────────────────────────────
-5783267088740508246 │ select i from t where i in ($1, $2)
6446023427308995149 │ select i from t where i in ($1, $2)
3778867339896201523 │ select i from t where i in (...)
-8733218180609156532 │ select i from t where i in ($1, $2)
-5106509834475638715 │ select i from t where i in ($1, $2)
If I understand what you're saying, it's that the extra parenthesis
cause the recorded query text be a little uglier (but the queryid still
ends up being one and the same for all queries), which seems much less
of a problem. I'm okay saying that cases like that can be improved
later. (It seems to me that you want to improve the way we pass the
lexed string down to pg_stat_statements, and frankly that even seems a
different problem altogether.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-02-12 18:40:15 | Re: explain analyze rows=%.0f |
Previous Message | Tom Lane | 2025-02-12 18:35:20 | Re: Small memory fixes for pg_createsubcriber |