Re: RFC: compression dictionaries for JSONB

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject: Re: RFC: compression dictionaries for JSONB
Date: 2021-10-13 21:25:35
Message-ID: CAEze2WiUEhP6gSMTumZZR7P=7-FttT8jHaadEa2KM3wxdLiv7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 13 Oct 2021 at 11:48, Aleksander Alekseev
<aleksander(at)timescale(dot)com> wrote:
>
> Hi Matthias,
>
> > Assuming this above is option 1. If I understand correctly, this
> > option was 'adapt the data type so that it understands how to handle a
> > shared dictionary, decreasing storage requirements'.
> > [...]
> > Assuming this was the 2nd option. If I understand correctly, this
> > option is effectively 'adapt or wrap TOAST to understand and handle
> > dictionaries for dictionary encoding common values'.
>
> Yes, exactly.
>
> > I think that an 'universal dictionary encoder' would be useful, but
> > that a data type might also have good reason to implement their
> > replacement methods by themselves for better overall performance (such
> > as maintaining partial detoast support in dictionaried items, or
> > overall lower memory footprint, or ...). As such, I'd really
> > appreciate it if Option 1 is not ruled out by any implementation of
> > Option 2.
>
> I agree, having the benefits of two approaches in one feature would be
> great. However, I'm having some difficulties imagining how the
> implementation would look like in light of the pros and cons stated
> above. I could use some help here.
>
> One approach I can think of is introducing a new entity, let's call it
> "dictionary compression method". The idea is similar to access methods
> and tableam's. There is a set of callbacks the dictionary compression
> method should implement, some are mandatory, some can be set to NULL.

You might also want to look into the 'pluggable compression support'
[0] and 'Custom compression methods' [1] threads for inspiration, as
that seems very similar to what was originally proposed there. (†)

One important difference from those discussed at [0][1] is that the
compression proposed here is at the type level, while the compression
proposed in both 'Pluggable compression support' and 'Custom
compression methods' is at the column / table / server level.

> Users can specify the compression method for the dictionary:
>
> ```
> CREATE TYPE name AS DICTIONARY OF JSONB (
> compression_method = 'jsonb_best_compression'
> -- compression_methods = 'jsonb_fastest_partial_decompression'
> -- if not specified, some default compression method is used
> );
> ```
>
> JSONB is maybe not the best example of the type for which people may
> need multiple compression methods in practice. But I can imagine how
> overwriting a compression method for, let's say, arrays in an
> extension could be beneficial depending on the application.
>
> This approach will make an API well-defined and, more importantly,
> extendable. In the future, we could add additional (optional) methods
> for particular scenarios, like partial decompression.
>
> Does it sound like a reasonable approach?

Yes, I think that's doable.

Kind regards,

Matthias

(†): 'Custom compression methods' eventually got committed in an
entirely different state by the way of commit bbe0a81db, where LZ4 is
now a toast compression option that can be configured at the column /
system level. This is a hard-coded compression method, so no
infrastructure (or at least, API) is available for custom compression
methods in that code.

[0] https://www.postgresql.org/message-id/flat/20130614230142.GC19641%40awork2.anarazel.de
[1] https://www.postgresql.org/message-id/flat/20170907194236(dot)4cefce96(at)wp(dot)localdomain

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-10-13 21:46:31 Re: [RFC] building postgres with meson
Previous Message Andres Freund 2021-10-13 21:15:17 Re: prevent immature WAL streaming