From: | Alex Stapleton <alexs(at)advfn(dot)com> |
---|---|
To: | Peter Fein <pfein(at)pobox(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Hash Function: MD5 or other? |
Date: | 2005-06-14 09:06:49 |
Message-ID: | 34DEB1AE-95EE-4462-8E51-9FEE07A9C565@advfn.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 13 Jun 2005, at 23:49, Peter Fein wrote:
> Hi-
>
> I wanted to use a partially unique index (dependent on a flag) on a
> TEXT
> column, but the index row size was too big for btrees. See the thread
> "index row size 2728 exceeds btree maximum, 2713" from the
> beginning of
> this month for someone with a similar problem. In it, someone
> suggests
> indexing on a hash of the text. I'm fine with this, as the texts in
> question are similar enough to each other to make collisions unlikely
> and a collision won't really cause any serious problems.
>
> My question is: is the builtin MD5 appropriate for this use or
> should I
> be using a function from pl/something? Figures on collision rates
> would
> be nice as well - the typical chunk of text is probably 1k-8k.
>
> Thanks!
>
As others have said MD5 isn't the fastest one out there. However no
cryptographically secure hashes are really that fast. However you
can probably get away with using a CRC hash which is long enough to
reduce your chances of collision a lot. However, PostgreSQL doesn't
have a built in CRC function, which is a bit of a pain unless your
prepared to implement one, or use pl/* to do it, which sounds like
overkill. I suggest you run some benchmarks on MD5 and see if it's
fast enough to meet your current (and perhaps future) needs.
You could of course, just use a hash index on your text field! I
think that would probably cope with larger data sets OK. It has the
advantage of handling collisions for you as well :) However it means
you have to transfer large amounts of data around, so if network
speed ever becomes a limitation, MD5 hashing (or enabling compression
on your PgSQL connection) may help.
> --
> Peter Fein pfein(at)pobox(dot)com
> 773-575-0694
>
> Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Együd Csaba | 2005-06-14 11:35:45 | Re: PG 8.0.1 is getting slow in 24 hours. Only daily VACUUM |
Previous Message | Ioannis Theoharis | 2005-06-14 09:05:56 | Re: suse 9.2 |