Quick Links

Re: Creating large database of MD5 hash values

From:	Florian Weimer <fweimer(at)bfk(dot)de>
To:	"Jon Stewart" <jonathan(dot)l(dot)stewart(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Creating large database of MD5 hash values
Date:	2008-04-11 14:05:00
Message-ID:	82k5j4sd0z.fsf@mid.bfk.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

* Jon Stewart:

> 1. Which datatype should I use to represent the hash value? UUIDs are
> also 16 bytes...

BYTEA is slower to load and a bit inconvenient to use from DBI, but
occupies less space on disk than TEXT or VARCHAR in hex form (17 vs 33
bytes with PostgreSQL 8.3).

> 2. Does it make sense to denormalize the hash set relationships?

That depends entirely on your application.

> 3. Should I index?

Depends. B-tree is generally faster than Hash, even for randomly
distributed keys (like the output of a hash function).

> 4. What other data structure options would it make sense for me to
> choose?

See 2.

In general, hashing is bad because it destroy locality. But in some
cases, there is no other choice.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

In response to

Creating large database of MD5 hash values at 2008-04-10 14:48:59 from Jon Stewart

Responses

Re: Creating large database of MD5 hash values at 2008-04-11 15:28:55 from Jon Stewart

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Alvaro Herrera	2008-04-11 14:25:44	Re: Creating large database of MD5 hash values
Previous Message	Albe Laurenz	2008-04-11 11:22:04	Performance increase with elevator=deadline