From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hash id in pg_stat_statements |
Date: | 2012-10-02 17:16:16 |
Message-ID: | 9844.1349198176@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Peter Geoghegan (peter(at)2ndquadrant(dot)com) wrote:
>> I simply do not understand objections to the proposal. Have I missed something?
> It was my impression that the concern is the stability of the hash value
> and ensuring that tools which operate on it don't mistakenly lump two
> different queries into one because they had the same hash value (caused
> by a change in our hashing algorithm or input into it over time, eg a
> point release). I was hoping to address that to allow this proposal to
> move forward..
I think there are at least two questions that ought to be answered:
1. Why isn't something like md5() on the reported query text an equally
good solution for users who want a query hash?
2. If people are going to accumulate stats on queries over a long period
of time, is a 32-bit hash really good enough for the purpose? If I'm
doing the math right, the chance of collision is already greater than 1%
at 10000 queries, and rises to about 70% for 100000 queries; see
http://en.wikipedia.org/wiki/Birthday_paradox
We discussed this issue and decided it was okay for pg_stat_statements's
internal hash table, but it's not at all clear to me that it's sensible
to use 32-bit hashes for external accumulation of query stats.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2012-10-02 17:58:40 | Re: Incorrect behaviour when using a GiST index on points |
Previous Message | Stephen Frost | 2012-10-02 16:58:15 | Re: Hash id in pg_stat_statements |