storing checksums in a database

From: Reece Hart <reece(at)harts(dot)net>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: storing checksums in a database
Date: 2007-04-10 19:24:17
Message-ID: 1176233057.4904.55.camel@snafu.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'd like to store several kinds of checksums in a database and would
appreciate some advice about the best way to do this. I will use these
checksums for uniqueness constraints and for searching.

I currently store md5s as text and I could store other checksums as text
also, but it occurs to me that there are at least two other options.

One option is to store binary blobs instead. This would save 2x on
storage over hex. For example, sha1 is a 160 bit hash; that's 20 bytes
as binary or 40 (8-bit) chars as hex. As longer hashes are used, I
imagine that the space savings might have significant impact on index
building and query optimization decisions. The downside is that one
would need to encode/decode for lookups based on the hash.

Another option is to create a custom type for hex digests that would
provide type conversion as necessary. These would provide the space
benefit of storing as a binary and the usability of having postgresql Do
The Right Thing depending on the type of the representations being
compared. I can't think of any downsides.

So, the questions:

1) Am I worrying about nothing, or is the size of a hash significant? In
my case, the rows consist of a protein sequence (~350 chars, as text), a
timestamp, an integer PK, and an integer length. The text md5 is 32 text
chars, of course, to which I might add sha1 (40 text chars). Thus, the
size of hashes is a sizable fraction of the rest of the row.

2) Does anyone know of postgresql digest data types that provide these
conversions? If not, I'll do it myself... but I'd rather steal of
course.

Thanks,
Reece

--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.

Browse pgsql-general by date

  From Date Subject
Next Message John D. Burger 2007-04-10 19:47:26 Re: NEWBIE: How do I get the oldest date contained in 3 tables
Previous Message Sean Davis 2007-04-10 19:18:07 Versioning