From: | Karel Zak - Zakkr <zakkr(at)zf(dot)jcu(dot)cz> |
---|---|
To: | Jan Wieck <wieck(at)debis(dot)com> |
Cc: | t-ishii(at)sra(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] compression in LO and other fields |
Date: | 1999-11-12 09:38:55 |
Message-ID: | Pine.LNX.3.96.991112101708.14930B-100000@ara.zf.jcu.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 12 Nov 1999, Jan Wieck wrote:
> Just in case someone want to implement a complete compressed
> data type (including comarision functions, operators and
> indexing default operator class).
>
> I already made some tests with a type I called 'lztext'
> locally. Only the input-/output-functions exist so far and
> as the name might suggest, it would be an alternative for
> 'text'. It uses a simple but fast, byte oriented LZ backward
> pointing method. No Huffman coding or variable offset/size
> tagging. First byte of a chunk tells bitwise if the next
> following 8 items are raw bytes to copy or 12 bit offset, 4
> bit size copy information. That is max back offset 4096 and
> max match size 17 bytes.
I is your original implementation or you use any current compression
code? I try bzip2, but output from this algorithm is total binary,
I don't know how this use in PgSQL if in backend are all routines
(in/out) use *char (yes, I'am newbie for PgSQL hacking:-).
>
> What made it my preferred method was the fact, that
> decompression is done entirely using the already decompressed
> portion of the data, so it does not need any code tables or
> the like at that time.
>
> It is really FASTEST on decompression, which I assume would
> be the mostly often used operation on huge data types. With
> some care, comparision could be done on the fly while
> decompressing two values, so that the entire comparision can
> be aborted at the occurence of the first difference.
>
> The compression rates aren't that giantic. I've got 30-50%
Not is problem, that your implementation compress all data at once?
Typically compression use a stream, and compress only small a buffer
in any cycle.
> for rule plan strings (size limit on views!!!). And the
> method used only allows for buffer back references of 4K
> offsets at most, so the rate will not grow for larger data
> chunks. That's a heavy tradeoff between compression rate and
> no memory leakage for sure and speed, I know, but I prefer
> not to force it, instead I usually use a bigger hammer (the
> tuple size limit is still our original problem - and another
> IBM 72GB disk doing 22-37 MB/s will make any compressing data
> type obsolete then).
>
> Sorry for the compression specific slang here. Well, anyone
> interested in the code?
Yes, for me - I finish to_char()/to_data() ora compatible routines
(Thomas, you still quiet?) and this is new appeal for me :-)
Karel
From | Date | Subject | |
---|---|---|---|
Next Message | Brian Hirt | 1999-11-12 09:49:01 | Re: [HACKERS] Slow - grindingly slow - query |
Previous Message | Karel Zak - Zakkr | 1999-11-12 09:16:21 | Re: [HACKERS] compression in LO and other fields |