| From: | Greg Stark <gsstark(at)mit(dot)edu> |
|---|---|
| To: | Andrew Piskorski <atp(at)piskorski(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Compression and on-disk sorting |
| Date: | 2006-05-17 03:48:21 |
| Message-ID: | 871wuts456.fsf@stark.xeocode.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches |
Andrew Piskorski <atp(at)piskorski(dot)com> writes:
> The main tricks seem to be: One, EXTREMELY lightweight compression
> schemes - basically table lookups designed to be as cpu friendly as
> posible. Two, keep the data compressed in RAM as well so that you can
> also cache more of the data, and indeed keep it the compressed until
> as late in the CPU processing pipeline as possible.
>
> A corrolary of that is forget compression schemes like gzip - it
> reduces data size nicely but is far too slow on the cpu to be
> particularly useful in improving overall throughput rates.
There are some very fast decompression algorithms:
http://www.oberhumer.com/opensource/lzo/
I think most of the mileage from "lookup tables" would be better implemented
at a higher level by giving tools to data modellers that let them achieve
denser data representations. Things like convenient enum data types, 1-bit
boolean data types, short integer data types, etc.
--
greg
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2006-05-17 03:51:51 | Re: PL/pgSQL 'i = i + 1' Syntax |
| Previous Message | Tom Lane | 2006-05-17 03:48:04 | Re: audit table containing Select statements submitted |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2006-05-17 04:03:15 | Re: Compression and on-disk sorting |
| Previous Message | Bruce Momjian | 2006-05-17 02:18:30 | Re: [HACKERS] .pgpass file and unix domain sockets |