lztext and compression ratios...

From: Jeffery Collins <collins(at)onyx-technologies(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: lztext and compression ratios...
Date: 2000-07-05 16:59:12
Message-ID: 39636960.FEF2400C@onyx-technologies.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-sql

I have been looking at using the lztext type and I have some
questions/observations. Most of my experience comes from attempting to
compress text records in a different database (CTREE), but I think the
experience is transferable.

My typical table consists of variable length text records. The average
length record is around 1K bytes. I would like to compress my records
to save space and improve I/O performance (smaller records means more
records fit into the file system cache which means less I/O - or so the
theory goes). I am not too concerned about CPU as we are using a 4-way
Sun Enterprise class server. So compress seems like a good idea to me.

My experience with attempting to compress such a relatively small
(around 1K) text string is that the compression ration is not very
good. This is because the string is not long enough for the LZ
compression algorithm to establish really good compression patterns and
the fact that the de-compression table has to be built into each
record. What I have done in the past to get around these problems is
that I have "taught" the compression algorithm the patterns ahead of
time and stored the de-compression patterns in an external table. Using
this technique, I have achieved *much* better compression ratios.

So my questions/comments are:

- What are the typical compression rations on relatively small (i.e.
around 1K) strings seen with lztext?
- Does anyone see a need/use for a generalized string compression
type that can be "trained" external to the individual records?
- Am I crazy in even attempting to compress strings of this relative
size? My largest table correct contains about 2 million entries of
roughly 1k size strings or about 2Gig of data. If I could compress this
to about 33% of it's original size (not unreasonable with a trained LZ
compression), I would save a lot of disk space (not really important)
and a lot of file system cache space (very important) and be able to fit
the entire table into memory (very, very important).

Thank you,
Jeff

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jan Wieck 2000-07-05 17:22:45 Re: [HACKERS] Re: Revised Copyright: is this morepalatable?
Previous Message Jan Wieck 2000-07-05 16:58:10 Re: help -- cursor inside a function

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Perdue 2000-07-05 17:00:39 Re: Article on MySQL vs. Postgres
Previous Message Jan Wieck 2000-07-05 16:56:30 update on TOAST status

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-07-05 17:15:22 Re: ERROR: ExecEvalAggref: no aggregates in this expression context
Previous Message Grigori Soloviov 2000-07-05 15:16:13 Help! PLPGSQL and PGSQL - does not support more than 8 arguments for a function?