Compression in LO and other fields

From: "Cary O'Brien" <cobrien(at)Radix(dot)Net>
To: pgsql-hackers(at)hub(dot)org
Subject: Compression in LO and other fields
Date: 1999-11-13 22:23:15
Message-ID: 199911132223.RAA05987@saltmine.radix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Some observations about compression (sorry if this is obvious):

Compression won't work unless there is redundancy in the information
being compressed. In a normal [1] database, there may be a little
redundancy in each row. The real redundancy will be when you group
rows together. I.E. in it is quite likely that many rows will have
identical or similar values for some columns. It may be worth
considering compressing each storage block[2], in the hopes that each
block will contain several rows, and that the rows will have some
similar (redundant) information. Then compression will work. Just
compressing one column of one row may work in some situations, but
in general, it won't.

Once a while ago we implemented a storage system where we grouped
records (about 1k in size) into blocks (about 16k in size) and
compressed each block individually. To get at a record you had to
decompress the block it was in, and skip to the desired offset. Even
though the system had to throw away a lot [3] of data for each
retrieval, the system turned out to be be both economical of disk[4]
storage and fast at retrieval. For this app we found that compression
rates increased as we got up to about 16 records, and then flattend
off. I.E. compressing 16 records used less space than compressing 4
groups of 4, but compressing 256 records together used as much space
as compressing 16 groups of 16.

The point?

I wish I understood the PostgreSQL backend storage algorithms better,
but it seems that the combination of a Tuneable block size [5] and
an option to compress individual storage blocks [6] might be worth
looking at.

-- cary

[1] Whatever the heck that is.

[2] My lack of understanding of PostgreSQL storage manager
internals is showing.

[3] Half a block on average.

[4] Optical disk in this particular app.

[5] This is an option now, right? But is it compile time, or run-time?
and is it postmaster-wide, per database? I *try* to read the pgsql-hackers-digest
but there is just too much going on!

[6] Can this work with the existing data file structure?

Browse pgsql-hackers by date

  From Date Subject
Next Message Tim Holloway 1999-11-14 00:58:14 Re: [HACKERS] Thread-safe queueing?
Previous Message Tom Lane 1999-11-13 22:21:07 Status of sql_help.h