Re: pg_stat_statements fingerprinting logic and ArrayExpr

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements fingerprinting logic and ArrayExpr
Date: 2013-12-10 22:55:38
Message-ID: 20131210225538.GC7730@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-12-10 17:46:56 -0500, Robert Haas wrote:
> On Tue, Dec 10, 2013 at 5:38 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-12-10 14:30:36 -0800, Peter Geoghegan wrote:
> >> Did you really find pg_stat_statements to be almost useless in such
> >> situations? That seems worse than I thought.
> >
> > It's very hard to see where you should spend efforts when every "logical
> > query" is split into hundreds of pg_stat_statement entries. Suddenly
> > it's important whether a certain counts of parameters are more frequent
> > than others because in the equally distributed cases they fall out of
> > p_s_s again pretty soon. I think that's probably a worse than average
> > case, but certainly not something only I could have the bad fortune of
> > looking at.
>
> Right, but the flip side is that you could collapse things that people
> don't want collapsed. If you've got lots of query that differ only in
> that some of them say user_id IN (const1, const2) and others say
> user_id IN (const1, const2, const3) and the constants vary a lot, then
> of course this seems attractive.

Yea, completely agreed. It might also lead to users missing the fact
that their precious prepared-statement cache is just using up loads of
backend memory and individual prepared statements are seldomly
re-executed because there are so many...

> On the other hand if you have two
> queries and one of them looks like this:
>
> WHERE status IN ('active') AND user_id = ?
>
> and the other looks like this:
>
> WHERE status IN ('inactive', 'deleted') AND user_id = ?

That too.

> Part of me wonders if the real solution here is to invent a way to
> support an arbitrarily large hash table of entries efficiently, and
> then let people do further roll-ups of the data in userland if they
> don't like our rollups. Part of the pain here is that when you
> overflow the hash table, you start losing information that can't be
> recaptured after the fact. If said hash table were by chance also
> suitable for use as part of the stats infrastructure, in place of the
> whole-file-rewrite technique we use today, massive win.
>
> Of course, even if we had all this, it necessarily make doing
> additional rollups *easy*; it's easy to construct cases that can be
> handled much better given access to the underlying parse tree
> representation than they can be with sed and awk. But it's a thought.

That would obviously be neat, but I have roughly no clue how to achieve
that. Granular control over how such rollups would work sounds very hard
to achieve unless that granular control just is getting passed a tree
and returning another.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-12-10 23:02:35 Re: pg_stat_statements fingerprinting logic and ArrayExpr
Previous Message Peter Geoghegan 2013-12-10 22:55:09 Re: pg_stat_statements fingerprinting logic and ArrayExpr