Quick Links

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Douglas McNaught" <doug(at)mcnaught(dot)org>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date:	2009-01-06 14:57:10
Message-ID:	603c8f070901060657k40de254ew53f510e6b5a0b2dd@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>>> not compressing very small datums (< 256 bytes) also seems smart,
>>> since that could end up producing a lot of extra compression attempts,
>>> most of which will end up saving little or no space.
>
> That was presumably the rationale for the original logic. However experience
> shows that there are certainly databases that store a lot of compressible
> short strings.
>
> Obviously databases with CHAR(n) desperately need us to compress them. But
> even plain text data are often moderately compressible even with our fairly
> weak compression algorithm.
>
> One other thing that bothers me about our toast mechanism is that it only
> kicks in for tuples that are "too large". It seems weird that the same column
> is worth compressing or not depending on what other columns are in the same
> tuple.

That's a fair point. There's definitely some inconsistency in the
current behavior. It seems to me that, in theory, compression and
out-of-line storage are two separate behaviors. Out-of-line storage
is pretty much a requirement for dealing with large objects, given
that the page size is a constant; compression is not a requirement,
but definitely beneficial under some circumstances, particularly when
it removes the need for out-of-line storage.

char(n) is kind of a wierd case because you could also compress by
storing a count of the trailing spaces, without applying a
general-purpose compression algorithm. But either way the field is no
longer fixed-width, and therefore field access can't be done as a
simple byte offset from the start of the tuple.

It's difficult even to enumerate the possible use cases, let alone
what knobs would be needed to cater to all of them.

...Robert

In response to

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) at 2009-01-06 07:47:24 from Gregory Stark

Responses

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows) at 2009-01-06 15:11:27 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2009-01-06 15:01:40	Hot Standby Query Conflicts
Previous Message	Tom Lane	2009-01-06 14:56:44	Re: Some more function-default issues