From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Neil Conway <neilc(at)samurai(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: TOAST compression |
Date: | 2006-02-26 17:19:17 |
Message-ID: | 19640.1140974357@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Neil Conway <neilc(at)samurai(dot)com> writes:
> toast_compress_datum() considers compression to be "successful" if the
> compressed version of the datum is smaller than the uncompressed
> version. I think this is overly generous: if compression reduces the
> size of the datum by, say, 0.01%, it is likely a net loss to use the
> compressed version of the datum since we'll need to pay for LZ
> decompression every time that we de-TOAST it. This situation can occur
> frequently when storing "mostly-uncompressible" data (compressed images,
> encrypted data, etc.) -- some parts of the data will compress well (e.g.
> metadata), but the vast majority will not.
Does it really occur frequently? When dealing with already-compressed
or encrypted data, the LZ transform actually makes the data larger by
some small percentage. This will outweigh any savings on compressible
headers or what have you, just because those are only a tiny part of the
file to begin with. (Else the format designers would have found a way
to compress them too.) So I'd expect the existing test to catch most of
the real-world cases you cite.
I'm not particularly inclined to worry about this without some hard
evidence that it's a problem.
You'd need some numerical evidence anyway to justify any specific
threshold, else it's just as arbitrary as "is it smaller" ... and the
latter at least requires a few instructions less to check.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Luke Lonergan | 2006-02-26 17:31:05 | Re: TOAST compression |
Previous Message | Jim C. Nasby | 2006-02-26 16:00:46 | Re: TOAST compression |