From: | Nikita Malakhov <hukutoc(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com> |
Subject: | Re: [PATCH] Compression dictionaries for JSONB |
Date: | 2023-02-10 18:22:14 |
Message-ID: | CAN-LCVOAKFe4d2aeqkUnY-ibqc-OwSiUeMJFxzZX+b8qCrJsgA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
If I understand Andres' message correctly - the proposition is to
make use of compression dictionaries automatic, possibly just setting
a parameter when the table is created, something like
CREATE TABLE t ( ..., t JSONB USE DICTIONARY);
The question is then how to create such dictionaries automatically
and extend them while data is being added to the table. Because
it is not something unusual when after a time circumstances change
and a rather small table is started to be loaded with huge amounts
of data.
I prefer extending a dictionary over re-creating it because while
dictionary is recreated we leave users two choices - to wait until
dictionary creation is over or to use the old version (say, kept as
as a snapshot while a new one is created). Keeping many versions
simultaneously does not make sense and would extend DB size.
Also, compressing small data with a large dictionary (the case for
one-for-many tables dictionary), I think, would add some considerable
overhead to the INSERT/UPDATE commands, so the most reasonable
choice is a per-table dictionary.
Am I right?
Any ideas on how to create and extend such dictionaries automatically?
On Thu, Feb 9, 2023 at 2:01 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On February 9, 2023 2:50:57 AM PST, Aleksander Alekseev <
> aleksander(at)timescale(dot)com> wrote:
> >Hi Andres,
> >
> >> > So to clarify, are we talking about tuple-level compression? Or
> >> > perhaps page-level compression?
> >>
> >> Tuple level.
> >
> >> although my own patch proposed attribute-level compression, not
> >> tuple-level one, it is arguably closer to tuple-level approach than
> >> page-level one
> >
> >Just wanted to make sure that by tuple-level we mean the same thing.
> >
> >When saying tuple-level do you mean that the entire tuple should be
> >compressed as one large binary (i.e. similarly to page-level
> >compression but more granularly), or every single attribute should be
> >compressed separately (similarly to how TOAST does this)?
>
> Good point - should have been clearer. I meant attribute wise compression.
> Like we do today, except that we would use a dictionary to increase
> compression rates.
>
> Andres
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-02-10 18:40:30 | Re: Inconsistent nullingrels due to oversight in deconstruct_distribute_oj_quals |
Previous Message | Jeff Davis | 2023-02-10 17:53:58 | Re: ICU locale validation / canonicalization |