From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_stat_statements fingerprinting logic and ArrayExpr |
Date: | 2013-12-11 00:23:59 |
Message-ID: | CAM3SWZQvGsRaOLD6B5Qvda4qj0OGfUH4zJjdGuKSuc43QCmFHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 10, 2013 at 3:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> A different point of view is that it's more or less an implementation
> artifact that pg_stat_statements doesn't already see the cases as
> equivalent; that happens only because it looks at the querytree before
> the planner gets around to constant-folding ARRAY[1,2,3] into the single
> Const '{1,2,3}'::int[].
Right. It's also an implementation artifact, for example, that
pg_stat_statements sees two queries as distinct based solely on the
choice of JOIN syntax for (arguably) semantically equivalent inner
join queries. In a perfect world, I guess it would see them as
non-distinct.
> So my objection to what Peter is suggesting is not that it's a bad idea
> in isolation, but that I don't see where he's going to stop, short of
> reinventing every query-normalization behavior that exists in the planner.
> If this particular case is worthy of fixing with a hack in the
> fingerprinter, aren't there going to be dozens more with just as good
> claims? (Perhaps not, but what's the basis for thinking this is far
> worse than any other normalization issue?)
The basis is simply that I've heard a number of complaints about it
(I'm shocked at just how big a problem Andres considers this to be),
and yet I've heard no complaints about any of the other cases. No one
cares about those, because those are distinctions that applications
and application frameworks are not at all inclined to unpredictably
inject. Doing some string interpolation to build what will become an
ArrayExpr during parse analysis is a common thing. But unpredictably
switching from one syntax to another equivalent one (like varying JOIN
syntax), with the relevant user code traceable to a single point in
the application, simply doesn't happen in the real world.
> I'm wondering whether this doesn't indicate that we need to rethink where
> the fingerprinter has been plugged in. I'm not sure that somewhere in
> the planner, post-constant-folding, would be a better place; but it's
> worth thinking about.
Hardly seems worth the trouble. A nice property of pg_stat_statements
is that it has good temporal locality; fingerprinting is just an extra
step of parse analysis. I don't think that this would necessarily be
lost by what you outline, but it'd sure be a concern.
We had this discussion, or at least a similar discussion back when the
normalization work was underway (fingerprint at raw parse tree vs.
post-parse-analysis tree .vs after rule expansion .vs plan tree). My
position at the time was that assigning query execution costs at the
plan granularity is a fine thing, but doing so at the query
granularity is also a fine thing, which is more immediately useful. I
believe this so strongly that I went so far as to write a derivative
of pg_stat_statements to do the fingerprinting at the plan granularity
(while expanding that in a more experimental direction, towards
querying the structure of plans, relatively untested ideas not well
suited to even a contrib module). I could certainly see us
instrumenting plan costs as a separate, perhaps complimentary thing,
but it should not happen at the expense of simple blaming of costs on
queries.
So, if you're not talking about blaming plans and not queries, you
must be talking about making the planner do the constant folding in a
way that facilitates tools like pg_stat_statements in getting the
"essential nature" of the query (*not* the plan) post constant
folding. Which doesn't seem like a good allocation of resources, since
as I said this is the sole complaint of this nature that I've heard,
and people talk to me about pg_stat_statements a lot.
The role of pg_stat_statements is to instrument statements: to assign
costs traceable back to a point in the application. Because of this
implementation artifact, that makes it harder to trace back the costs
to that point, because some entries with less execution numbers of
ArrayExpr elements may be evicted while others are not, and so on.
I just discussed the Django issue with Jacob Kaplan-Moss, Heroku's
directory of security, who is on the Django core team. He's going to
take a look at it, and I wouldn't be surprised if it was fixed, but
that's hardly a scalable model.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2013-12-11 00:28:33 | Re: ANALYZE sampling is too good |
Previous Message | Jeff Janes | 2013-12-11 00:22:20 | Re: Why we are going to have to go DirectIO |