Re: [survey] New "Stable" QueryId based on normalized query text

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Evgeniy Efimkin <efimkin(at)yandex-team(dot)ru>
Cc: legrand legrand <legrand_legrand(at)hotmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [survey] New "Stable" QueryId based on normalized query text
Date: 2019-08-12 14:15:31
Message-ID: CAOBaU_Z-5HZQ5Zc71pj0NdshD4vBy84nj8q+Zok37XhHpu7MpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 12, 2019 at 4:01 PM Evgeniy Efimkin <efimkin(at)yandex-team(dot)ru> wrote:
>
> > One problem with pg_stat_statement's normalized query is that it's not
> > stable, it's storing the normalized version of the first query string
> > passed when an entry is created. So you could have different strings
> > depending on whether the query was fully qualified or relying on
> > search path for instance.
> I think normalized query stored in pg_stat_statement it's not very important.
> it might look something like that
> `
> query | calls | queryid | sql_id
> -----------------------+-------+------------+------------
> Select * from t | 4 | 762359559 | 680388963
> select * from t | 7 | 3438533065 | 680388963
> select * from test2.t | 1 | 230362373 | 680388963
> `
> we can cut schema name in sql normalization
> algorithm

Not only schema name but all kind of qualification and indeed extra
whitespaces. Things get harder for other difference that aren't
meaningful (LIKE vs ~~, IN vs = ANY...). That would also imply that
everyone wants to ignore schemas in query normalization, I'm not sure
how realistic that is.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2019-08-12 14:30:33 Re: Add "password_protocol" connection parameter to libpq
Previous Message Evgeniy Efimkin 2019-08-12 14:01:46 Re: [survey] New "Stable" QueryId based on normalized query text