hash options

From: "Little, Douglas" <DOUGLAS(dot)LITTLE(at)orbitz(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: hash options
Date: 2012-01-22 15:59:55
Message-ID: 8585BA53443004458E0BAA6134C5A7FB9CD327F6@EGEXCMB01.oww.root.lcl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm working on a data warehouse dimensionalization process where I need to hash a text string to use as the key.
I've implemented with MD5. It works fine, the problem I have is the size of the md5 (32 bytes) is often longer than the original string - thus not accomplishing what I want - space savings.

Does anybody have alternative hash function recommendations?
I looked at the options I knew of
select length(encode('ar=514','hex')); -- 12
select length(decode('ar=514','base64')); -- 24
select length(DIGEST('ar=514', 'md5')) -- 16bytes
select length(DIGEST('ar=514', 'sha1')) -- 20bytes

function is currently written in pg/plsql, but I'm considering switching to python for broader library choice.

Source data is delimited list of name/value pairs. Length range from 0-2500 bytes.
ar=514,cc=CA,ci=Montreal,cn=North+America,co=Sympatico,cs=Canada,nt=Xdsl,rc=QC,rs=Quebec,tp=High,tz=GMT%2D5

Thanks in advance
Doug Little

Sr. Data Warehouse Architect | Business Intelligence Architecture | Orbitz Worldwide
Douglas(dot)Little(at)orbitz(dot)com<mailto:Douglas(dot)Little(at)orbitz(dot)com>
[cid:image001(dot)jpg(at)01CCD8EB(dot)FDD3C490] orbitz.com<http://www.orbitz.com/> | ebookers.com<http://www.ebookers.com/> | hotelclub.com<http://www.hotelclub.com/> | cheaptickets.com<http://www.cheaptickets.com/> | ratestogo.com<http://www.ratestogo.com/> | asiahotels.com<http://www.asiahotels.com/>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Angelico 2012-01-22 16:10:53 Re: hash options
Previous Message John R Pierce 2012-01-22 09:05:09 Re: Database takes up MUCH more disk space than it should