Re: pg_stat_statements fingerprinting logic and ArrayExpr

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements fingerprinting logic and ArrayExpr
Date: 2013-12-10 22:46:56
Message-ID: CA+Tgmob=1BJp-w3DobvT50WPGoG7_CdVXbZz_LY7=rkN0f_sCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
>> Did you really find pg_stat_statements to be almost useless in such
>> situations? That seems worse than I thought.
>
> It's very hard to see where you should spend efforts when every "logical
> query" is split into hundreds of pg_stat_statement entries. Suddenly
> it's important whether a certain counts of parameters are more frequent
> than others because in the equally distributed cases they fall out of
> p_s_s again pretty soon. I think that's probably a worse than average
> case, but certainly not something only I could have the bad fortune of
> looking at.

Right, but the flip side is that you could collapse things that people
don't want collapsed. If you've got lots of query that differ only in
that some of them say user_id IN (const1, const2) and others say
user_id IN (const1, const2, const3) and the constants vary a lot, then
of course this seems attractive. On the other hand if you have two
queries and one of them looks like this:

WHERE status IN ('active') AND user_id = ?

and the other looks like this:

WHERE status IN ('inactive', 'deleted') AND user_id = ?

...it might actually annoy you to have those two things conflated;
it's easy to imagine one having much different performance
characteristics than the other.

Part of me wonders if the real solution here is to invent a way to
support an arbitrarily large hash table of entries efficiently, and
then let people do further roll-ups of the data in userland if they
don't like our rollups. Part of the pain here is that when you
overflow the hash table, you start losing information that can't be
recaptured after the fact. If said hash table were by chance also
suitable for use as part of the stats infrastructure, in place of the
whole-file-rewrite technique we use today, massive win.

Of course, even if we had all this, it necessarily make doing
additional rollups *easy*; it's easy to construct cases that can be
handled much better given access to the underlying parse tree
representation than they can be with sed and awk. But it's a thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-12-10 22:55:09 Re: pg_stat_statements fingerprinting logic and ArrayExpr
Previous Message Peter Geoghegan 2013-12-10 22:42:27 Re: pg_stat_statements fingerprinting logic and ArrayExpr