Re: Compression of full-page-writes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compression of full-page-writes
Date: 2014-12-08 19:37:44
Message-ID: CA+TgmoYhw0pkAD=nPPdpoeT0itF5S3sHO-wEWEx7k9bYZS8VqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 8, 2014 at 2:21 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-12-08 14:09:19 -0500, Robert Haas wrote:
>> > records, just fpis. There is no evidence that we even want to compress
>> > other record types, nor that our compression mechanism is effective at
>> > doing so. Simple => keep name as compress_full_page_writes
>>
>> Quite right.
>
> I don't really agree with this. There's lots of records which can be
> quite big where compression could help a fair bit. Most prominently
> HEAP2_MULTI_INSERT + INIT_PAGE. During initial COPY that's the biggest
> chunk of WAL. And these are big and repetitive enough that compression
> is very likely to be beneficial.
>
> I still think that just compressing the whole record if it's above a
> certain size is going to be better than compressing individual
> parts. Michael argued thta that'd be complicated because of the varying
> size of the required 'scratch space'. I don't buy that argument
> though. It's easy enough to simply compress all the data in some fixed
> chunk size. I.e. always compress 64kb in one go. If there's more
> compress that independently.

I agree that idea is worth considering. But I think we should decide
which way is better and then do just one or the other. I can't see
the point in adding wal_compress=full_pages now and then offering an
alternative wal_compress=big_records in 9.5.

I think it's also quite likely that there may be cases where
context-aware compression strategies can be employed. For example,
the prefix/suffix compression of updates that Amit did last cycle
exploit the likely commonality between the old and new tuple. We
might have cases like that where there are meaningful trade-offs to be
made between CPU and I/O, or other reasons to have user-exposed knobs.
I think we'll be much happier if those are completely separate GUCs,
so we can say things like compress_gin_wal=true and
compress_brin_effort=3.14 rather than trying to have a single
wal_compress GUC and assuming that we can shoehorn all future needs
into it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-12-08 19:39:02 Re: On partitioning
Previous Message Josh Berkus 2014-12-08 19:30:30 Re: On partitioning