Re: [HACKERS] compression in LO and other fields

From: Karel Zak - Zakkr <zakkr(at)zf(dot)jcu(dot)cz>
To: Jan Wieck <wieck(at)debis(dot)com>
Cc: t-ishii(at)sra(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] compression in LO and other fields
Date: 1999-11-12 09:38:55
Message-ID: Pine.LNX.3.96.991112101708.14930B-100000@ara.zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 12 Nov 1999, Jan Wieck wrote:

> Just in case someone want to implement a complete compressed
> data type (including comarision functions, operators and
> indexing default operator class).
>
> I already made some tests with a type I called 'lztext'
> locally. Only the input-/output-functions exist so far and
> as the name might suggest, it would be an alternative for
> 'text'. It uses a simple but fast, byte oriented LZ backward
> pointing method. No Huffman coding or variable offset/size
> tagging. First byte of a chunk tells bitwise if the next
> following 8 items are raw bytes to copy or 12 bit offset, 4
> bit size copy information. That is max back offset 4096 and
> max match size 17 bytes.

I is your original implementation or you use any current compression
code? I try bzip2, but output from this algorithm is total binary,
I don't know how this use in PgSQL if in backend are all routines
(in/out) use *char (yes, I'am newbie for PgSQL hacking:-).

>
> What made it my preferred method was the fact, that
> decompression is done entirely using the already decompressed
> portion of the data, so it does not need any code tables or
> the like at that time.
>
> It is really FASTEST on decompression, which I assume would
> be the mostly often used operation on huge data types. With
> some care, comparision could be done on the fly while
> decompressing two values, so that the entire comparision can
> be aborted at the occurence of the first difference.
>
> The compression rates aren't that giantic. I've got 30-50%

Not is problem, that your implementation compress all data at once?
Typically compression use a stream, and compress only small a buffer
in any cycle.

> for rule plan strings (size limit on views!!!). And the
> method used only allows for buffer back references of 4K
> offsets at most, so the rate will not grow for larger data
> chunks. That's a heavy tradeoff between compression rate and
> no memory leakage for sure and speed, I know, but I prefer
> not to force it, instead I usually use a bigger hammer (the
> tuple size limit is still our original problem - and another
> IBM 72GB disk doing 22-37 MB/s will make any compressing data
> type obsolete then).
>
> Sorry for the compression specific slang here. Well, anyone
> interested in the code?

Yes, for me - I finish to_char()/to_data() ora compatible routines
(Thomas, you still quiet?) and this is new appeal for me :-)

Karel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brian Hirt 1999-11-12 09:49:01 Re: [HACKERS] Slow - grindingly slow - query
Previous Message Karel Zak - Zakkr 1999-11-12 09:16:21 Re: [HACKERS] compression in LO and other fields