From: | wieck(at)debis(dot)com (Jan Wieck) |
---|---|
To: | t-ishii(at)sra(dot)co(dot)jp |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, zakkr(at)zf(dot)jcu(dot)cz, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] compression in LO and other fields |
Date: | 1999-11-12 03:32:58 |
Message-ID: | m11m7SM-0003kLC@orion.SAPserv.Hamburg.dsh.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tatsuo Ishii wrote:
> > LO is a dead end. What we really want to do is eliminate tuple-size
> > restrictions and then have large ordinary fields (probably of type
> > bytea) in regular tuples. I'd suggest working on compression in that
> > context, say as a new data type called "bytez" or something like that.
>
> It sounds ideal but I remember that Vadim said inserting a 2GB record
> is not good idea since it will be written into the log too. If it's a
> necessary limitation from the point of view of WAL, we have to accept
> it, I think.
Just in case someone want to implement a complete compressed
data type (including comarision functions, operators and
indexing default operator class).
I already made some tests with a type I called 'lztext'
locally. Only the input-/output-functions exist so far and
as the name might suggest, it would be an alternative for
'text'. It uses a simple but fast, byte oriented LZ backward
pointing method. No Huffman coding or variable offset/size
tagging. First byte of a chunk tells bitwise if the next
following 8 items are raw bytes to copy or 12 bit offset, 4
bit size copy information. That is max back offset 4096 and
max match size 17 bytes.
What made it my preferred method was the fact, that
decompression is done entirely using the already decompressed
portion of the data, so it does not need any code tables or
the like at that time.
It is really FASTEST on decompression, which I assume would
be the mostly often used operation on huge data types. With
some care, comparision could be done on the fly while
decompressing two values, so that the entire comparision can
be aborted at the occurence of the first difference.
The compression rates aren't that giantic. I've got 30-50%
for rule plan strings (size limit on views!!!). And the
method used only allows for buffer back references of 4K
offsets at most, so the rate will not grow for larger data
chunks. That's a heavy tradeoff between compression rate and
no memory leakage for sure and speed, I know, but I prefer
not to force it, instead I usually use a bigger hammer (the
tuple size limit is still our original problem - and another
IBM 72GB disk doing 22-37 MB/s will make any compressing data
type obsolete then).
Sorry for the compression specific slang here. Well, anyone
interested in the code?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 1999-11-12 03:50:16 | Re: [HACKERS] compression in LO and other fields |
Previous Message | Tatsuo Ishii | 1999-11-12 02:00:08 | Re: [HACKERS] compression in LO and other fields |