Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure

From: Alexey Ermakov <alexey(dot)ermakov(at)dataegret(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: marcus(dot)kempe(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
Date: 2022-10-28 07:42:10
Message-ID: 06ff22d7-e92a-f97b-8b91-512de71aea9f@dataegret.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello, I've seen possibly related case where that might be a problem.

PostgreSQL 12.8, pg_stat_statements.max=10000. There are queries like these:
insert into tablename (col1, ..., col28) values ($1, ..., $28), ...
($2605, ... , $2660) on conflict do nothing.

Problem is that not only we might insert different number of rows but
there might be many duplicates of the same exact query in
pg_stat_statements (userid and dbid are the same in all rows):

select md5(query), count(*), pg_size_pretty(sum(length(query))) from
pg_stat_statements
where query like 'insert into tablename%' group by 1 order by 2 desc
limit 10;

   md5    | count | pg_size_pretty
----------+-------+----------------
 170...84 |  1283 | 2679 kB
 906...53 |   897 | 1724 kB
 55e...78 |   676 | 12 MB
 ea4...ef |   642 | 11 MB
 944...0f |   629 | 11 MB
 04e...ec |   530 | 9138 kB
 572...27 |   476 | 8575 kB
 5ea...3c |   430 | 7331 kB
 be1...6c |   324 | 5704 kB
 f69...47 |   313 | 550 kB

There are 8 bigint columns in the table, I suspect that we might insert
into some of them values both greater and less than INT_MAX.
In result pg_stat_statements buffer is full of such queries (they take
9k rows out of 10k right now, 10h since pg_stat_statements_reset call),
currently total sum of query lengths is 117MB (97% of total sum).

From monitoring perspective that might be a problem since calculating
metrics from pg_stat_statements became more expensive and we might lose
information about some queries when there is not enough space for them.

But I suppose such cases are pretty rare and some of them could be fixed
with COPY queries instead of inserts.

--
Thanks,
Alexey Ermakov

On 2022-10-28 00:28, Tom Lane wrote:
> Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
>> On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
>>> I would have expected both calls to be normalized and fingerprinted to one
>>> and the same queryId.
>> Agreed, and that's actually a known problem that is currently being worked on.
>> You can look at
>> https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
>> for details about the discussion and in-progress fix.
> Um ... that seems unrelated. AFAICS the OP is complaining about
> the fact that '42'::int4::int8 is not identical to '42'::int8.
> Well, they're not. I seriously doubt that we would or should
> consider trying to get queryjumble to mask that.
>
> regards, tom lane
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-10-28 07:48:50 BUG #17670: Logical Replication data may be lost on the subscription under certain scenarios
Previous Message Marcus Kempe 2022-10-28 04:48:58 Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure