Re: pg_stat_statements and "IN" conditions

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, yasuo(dot)honda(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, smithpb2250(at)gmail(dot)com, vignesh21(at)gmail(dot)com, michael(at)paquier(dot)xyz, nathandbossart(at)gmail(dot)com, stark(dot)cfm(at)gmail(dot)com, geidav(dot)pg(at)gmail(dot)com, marcos(at)f10(dot)com(dot)br, robertmhaas(at)gmail(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)postgresql(dot)org, pavel(dot)trukhanov(at)gmail(dot)com, Sutou Kouhei <kou(at)clear-code(dot)com>
Subject: Re: pg_stat_statements and "IN" conditions
Date: 2025-02-12 19:01:15
Message-ID: CAA5RZ0tf_6m3AJyJKCVQbtByUveEvQ25LcjHKd3QtYs7mJnr5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The nastiness level of this seems quite low, compared to what happens to
> this other example if we didn't handle these easy cases:
>
> create table t (a float);
> select i from t where i in (1, 2);
> select i from t where i in (1, '2');
> select i from t where i in ('1', 2);
> select i from t where i in ('1', '2');
> select i from t where i in (1.0, 1.0);
>
> (The point here is that the datatype differs for the constants from the
> lexer down in each of these cases.)
>
> I think it's more important to handle this better than what the posted
> patch does, than improving the lexing in presence of other lexical
> elements in the query. With the current patch I get _five_
> pg_stat_statements entries from these queries above, where only one of
> them was able to apply merging of the elements:

I want to throw out an idea I was looking at today, if I may, and
apologies if this
was brought up earlier. The approachI am looking at is to have a
custom jumbling
function for ArrayExpr to handle FuncExpr and Const. The simple Const
case is quite simple,
but in the case of FuncExpr we can traverse the "args" to find the constants
and simple record the location. I don't think this adds extra overhead as
this work to traverse the "args" must already occur.

I see a few benefits with this approach:

1/ It deals with the different constant types in the IN-LIST
2/ It also deals with an IN-LIST that contains a subquery with an IN-LIST. Maybe
I am wrong, but the current patch does not support this case.

The negative aspect of this patch is the custom jumbling, but we have
examples of that for other expression types.

select pg_stat_statements_reset();
drop table if exists foo;
create table foo (col_int int, col_smallint smallint, col_bigint
bigint, col_float float, col_text text, col_varchar varchar);
select from foo where col_int in (1, 2, 3);
select from foo where col_int in (1, 2, 3, 4);
select from foo where col_int in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, '11');
select from foo where col_bigint in (1, 2, 3);
select from foo where col_bigint in (1, 2, 3, 4);
select from foo where col_bigint in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, '11');
select from foo where col_int in ((select 1 from foo where col_int in
(1, 2, 3, 4, 5)));
select from foo where col_int in ((select 1 from foo where col_int in
(1, 2, 3, 4, 5, 6, '7')));
select from foo where col_bigint in ((select 1 from foo where col_int
in (1, 2, 3, 4, 5)));
select from foo where col_bigint in ((select 1 from foo where col_int
in (1, 2, 3, 4, 5, 6, '7')));
select query, queryid, calls from pg_stat_statements where query like
'select from foo where%' order by stats_since asc;

query
| queryid | calls
--------------------------------------------------------------------------------------------------+----------------------+-------
select from foo where col_int in ($1, $2, $3)
| -8353422683112638164 | 3
select from foo where col_bigint in ($1, $2, $3)
| -1245684714076243746 | 3
select from foo where col_int in ((select $1 from foo where col_int
in ($2, $3, $4, $5, $6))) | 6337165648188018609 | 2
select from foo where col_bigint in ((select $1 from foo where
col_int in ($2, $3, $4, $5, $6))) | -1521638960953725050 | 2
(4 rows)

I will park the poc of the idea here. It does not deal with the presentation
as the current patch, but maybe it could be something to work from.

Regards,

Sami

Attachment Content-Type Size
experiment_custom_array_expr.patch application/octet-stream 2.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-12 19:08:35 Re: BitmapHeapScan streaming read user and prelim refactoring
Previous Message Jeff Davis 2025-02-12 18:56:51 Re: Optimization for lower(), upper(), casefold() functions.