Re: [HACKERS] compression in LO and other fields

From: wieck(at)debis(dot)com (Jan Wieck)
To: t-ishii(at)sra(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, zakkr(at)zf(dot)jcu(dot)cz, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] compression in LO and other fields
Date: 1999-11-12 03:32:58
Message-ID: m11m7SM-0003kLC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tatsuo Ishii wrote:

> > LO is a dead end. What we really want to do is eliminate tuple-size
> > restrictions and then have large ordinary fields (probably of type
> > bytea) in regular tuples. I'd suggest working on compression in that
> > context, say as a new data type called "bytez" or something like that.
>
> It sounds ideal but I remember that Vadim said inserting a 2GB record
> is not good idea since it will be written into the log too. If it's a
> necessary limitation from the point of view of WAL, we have to accept
> it, I think.

Just in case someone want to implement a complete compressed
data type (including comarision functions, operators and
indexing default operator class).

I already made some tests with a type I called 'lztext'
locally. Only the input-/output-functions exist so far and
as the name might suggest, it would be an alternative for
'text'. It uses a simple but fast, byte oriented LZ backward
pointing method. No Huffman coding or variable offset/size
tagging. First byte of a chunk tells bitwise if the next
following 8 items are raw bytes to copy or 12 bit offset, 4
bit size copy information. That is max back offset 4096 and
max match size 17 bytes.

What made it my preferred method was the fact, that
decompression is done entirely using the already decompressed
portion of the data, so it does not need any code tables or
the like at that time.

It is really FASTEST on decompression, which I assume would
be the mostly often used operation on huge data types. With
some care, comparision could be done on the fly while
decompressing two values, so that the entire comparision can
be aborted at the occurence of the first difference.

The compression rates aren't that giantic. I've got 30-50%
for rule plan strings (size limit on views!!!). And the
method used only allows for buffer back references of 4K
offsets at most, so the rate will not grow for larger data
chunks. That's a heavy tradeoff between compression rate and
no memory leakage for sure and speed, I know, but I prefer
not to force it, instead I usually use a bigger hammer (the
tuple size limit is still our original problem - and another
IBM 72GB disk doing 22-37 MB/s will make any compressing data
type obsolete then).

Sorry for the compression specific slang here. Well, anyone
interested in the code?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-11-12 03:50:16 Re: [HACKERS] compression in LO and other fields
Previous Message Tatsuo Ishii 1999-11-12 02:00:08 Re: [HACKERS] compression in LO and other fields