From: | "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Re: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption |
Date: | 2013-10-07 12:50:23 |
Message-ID: | 20131007125023.GQ16128@aart.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 07, 2013 at 12:41:58AM +0200, Tomas Vondra wrote:
> > 2. Consider using a simpler/faster hash function, like FNV[1] or Jenkins[2].
> > For fun, try not hashing those ints at all and see how that performs (that,
> > I think, is what you get from HashSet<int> in Java/C#).
>
> I've used crc32 mostly because it's easily available in the code (i.e.
> I'm lazy), but I've done some quick research / primitive benchmarking
> too. For example hashing 2e9 integers takes this much time:
>
> FNV-1 = 11.9
> FNV-1a = 11.9
> jenkins = 38.8
> crc32 = 32.0
>
> So it's not really "slow" and the uniformity seems to be rather good.
>
> I'll try FNV in the future, however I don't think that's the main issue
> right now.
>
Hi Tomas,
If you are going to use a function that is not currently in the code,
please consider xxhash:
http://code.google.com/p/xxhash/
Here are some benchmarks for some of the faster hash functions:
Name Speed Q.Score Author
xxHash 5.4 GB/s 10
MumurHash 3a 2.7 GB/s 10 Austin Appleby
SpookyHash 2.0 GB/s 10 Bob Jenkins
SBox 1.4 GB/s 9 Bret Mulvey
Lookup3 1.2 GB/s 9 Bob Jenkins
CityHash64 1.05 GB/s 10 Pike & Alakuijala
FNV 0.55 GB/s 5 Fowler, Noll, Vo
CRC32 0.43 GB/s 9
MD5-32 0.33 GB/s 10 Ronald L. Rivest
SHA1-32 0.28 GB/s 10
Regards,
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-10-07 13:32:32 | Re: logical changeset generation v6.1 |
Previous Message | Fabien COELHO | 2013-10-07 12:02:31 | Re: pgbench progress report improvements - split 3 v2 - A |