From: | Florian Weimer <fweimer(at)bfk(dot)de> |
---|---|
To: | "Jon Stewart" <jonathan(dot)l(dot)stewart(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Creating large database of MD5 hash values |
Date: | 2008-04-11 14:05:00 |
Message-ID: | 82k5j4sd0z.fsf@mid.bfk.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
* Jon Stewart:
> 1. Which datatype should I use to represent the hash value? UUIDs are
> also 16 bytes...
BYTEA is slower to load and a bit inconvenient to use from DBI, but
occupies less space on disk than TEXT or VARCHAR in hex form (17 vs 33
bytes with PostgreSQL 8.3).
> 2. Does it make sense to denormalize the hash set relationships?
That depends entirely on your application.
> 3. Should I index?
Depends. B-tree is generally faster than Hash, even for randomly
distributed keys (like the output of a hash function).
> 4. What other data structure options would it make sense for me to
> choose?
See 2.
In general, hashing is bad because it destroy locality. But in some
cases, there is no other choice.
--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2008-04-11 14:25:44 | Re: Creating large database of MD5 hash values |
Previous Message | Albe Laurenz | 2008-04-11 11:22:04 | Performance increase with elevator=deadline |