Quick Links

Re: Custom compression methods

From:	Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
To:	Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom compression methods
Date:	2017-11-02 09:41:01
Message-ID:	20171102124101.5a28ecab@wp.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 1 Nov 2017 17:05:58 -0400
Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:

> On 9/12/17 10:55, Ildus Kurbangaliev wrote:
> >> The patch also includes custom compression method for tsvector
> >> which is used in tests.
> >>
> >> [1]
> >> https://www.postgresql.org/message-id/CAPpHfdsdTA5uZeq6MNXL5ZRuNx%2BSig4ykWzWEAfkC6ZKMDy6%3DQ%40mail.gmail.com
> > Attached rebased version of the patch. Added support of pg_dump, the
> > code was simplified, and a separate cache for compression options
> > was added.
>
> I would like to see some more examples of how this would be used, so
> we can see how it should all fit together.
>
> So far, it's not clear to me that we need a compression method as a
> standalone top-level object. It would make sense, perhaps, to have a
> compression function attached to a type, so a type can provide a
> compression function that is suitable for its specific storage.

In this patch compression methods is suitable for MAIN and EXTENDED
storages like in current implementation in postgres. Just instead only
of LZ4 you can specify any other compression method.

Idea is not to change compression for some types, but give the user and
extension developers opportunity to change how data in some attribute
will be compressed because they know about it more than database itself.

>
> The proposal here is very general: You can use any of the eligible
> compression methods for any attribute. That seems very complicated to
> manage. Any attribute could be compressed using either a choice of
> general compression methods or a type-specific compression method, or
> perhaps another type-specific compression method. That's a lot. Is
> this about packing certain types better, or trying out different
> compression algorithms, or about changing the TOAST thresholds, and
> so on?

It is about extensibility of postgres, for example if you
need to store a lot of time series data you can create an extension that
stores array of timestamps in more optimized way, using delta encoding
or something else. I'm not sure that such specialized things should be
in core.

In case of array of timestamps in could look like this:

CREATE EXTENSION timeseries; -- some extension that provides compression
method

Extension installs a compression method:

CREATE OR REPLACE FUNCTION timestamps_compression_handler(INTERNAL)
RETURNS COMPRESSION_HANDLER AS 'MODULE_PATHNAME',
'timestamps_compression_handler' LANGUAGE C STRICT;

CREATE COMPRESSION METHOD cm1 HANDLER timestamps_compression_handler;

And user can specify it in his table:

CREATE TABLE t1 (
time_series_data timestamp[] COMPRESSED cm1;
)

I think generalization of some method to a type is not a good idea. For
some attribute you could be happy with builtin LZ4, for other you can
need more compressibility and so on.

--
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Re: Custom compression methods at 2017-11-01 21:05:58 from Peter Eisentraut

Responses

Re: Custom compression methods at 2017-11-02 15:02:34 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2017-11-02 11:01:05	Re: Add some const decorations to prototypes
Previous Message	Robert Haas	2017-11-02 09:02:25	Re: [POC] hash partitioning