Re: Compression and on-disk sorting

From: "Zeugswetter Andreas DCP SD" <ZeugswetterA(at)spardat(dot)at>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Andrew Piskorski" <atp(at)piskorski(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compression and on-disk sorting
Date: 2006-05-17 11:47:17
Message-ID: E1539E0ED7043848906A8FF995BDA5790105450D@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> Unfortunatly, the interface provided by pg_lzcompress.c is probably
> insufficient for this purpose. You want to be able to compress tuples
> as they get inserted and start a new block once the output reaches a

I don't think anything that compresses single tuples without context is
going to be a win under realistic circumstances.

I would at least compress whole pages. Allow a max ratio of 1:n,
have the pg buffercache be uncompressed, and only compress on write
(filesystem cache then holds compressed pages).

The tricky part is predicting whether a tuple still fits in a n*8k
uncompressed
8k compressed page, but since lzo is fast you might even test it in
corner cases.
(probably logic that needs to also be in the available page freespace
calculation)
Choosing a good n is also tricky, probably 2 (or 3 ?) is good.

You probably also want to always keep the header part of the page
uncompressed.

Andreas

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2006-05-17 12:25:58 Re: Compression and on-disk sorting
Previous Message Hannu Krosing 2006-05-17 11:10:07 Re: Compression and on-disk sorting