Quick Links

Normalize queries starting with SET for pg_stat_statements

From:	Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Normalize queries starting with SET for pg_stat_statements
Date:	2024-07-22 19:23:50
Message-ID:	CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I saw a database recently where some app was inserting the source port into
the application_name field, which meant that pg_stat_statements.max was
quickly reached and queries were simply pouring in and out of
pg_stat_statements, dominated by some "SET application_name = 'myapp
10.0.0.1:1234'" calls. Which got me thinking, is there really any value to
having non-normalized 'SET application_name' queries inside of
pg_stat_statements? Or any SET stuff, for that matter?

Attached please find a small proof-of-concept for normalizing/de-jumbling
certain SET queries. Because we only want to cover the VAR_SET_VALUE parts
of VariableSetStmt, a custom jumble func was needed. There are a lot of
funky SET things inside of gram.y as well that don't do the standard SET X
= Y formula (e.g. SET TIME ZONE, SET SCHEMA). I tried to handle those as
best I could, and carved a couple of exceptions for time zones and xml.

I'm not sure where else to possibly draw lines. Obviously calls to time
zone have a small and finite pool of possible values, so easy enough to
exclude them, while things like application_name and work_mem are fairly
infinite, so great candidates for normalizing. One could argue for simply
normalizing everything, as SET is trivially fast for purposes of
performance tracking via pg_stat_statements, so who cares if we don't have
the exact string? That's what regular logging is for, after all. Most
importantly, less unique queryids means less chance that errant SETs will
crowd out the more important stuff.

In summary, we want to change this:

SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
1 | set application_name = 'alice'
1 | set application_name = 'bob'
1 | set application_name = 'eve'
1 | set application_name = 'mallory'

to this:

SELECT calls, query from pg_stat_statements where query ~ 'set' order by 1;
4 | set application_name = $1

I haven't updated the regression tests yet, until we reach a consensus on
how thorough the normalizing should be. But there is a new test to exercise
the changes in gram.y.

Cheers,
Greg

Attachment	Content-Type	Size
0001-Normalize-queries-starting-with-SET.patch	application/octet-stream	15.7 KB

Responses

Re: Normalize queries starting with SET for pg_stat_statements at 2024-07-23 02:36:53 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2024-07-22 19:37:30	Re: CI, macports, darwin version problems
Previous Message	Heikki Linnakangas	2024-07-22 19:15:37	Re: Latches vs lwlock contention