Re: Optimize external TOAST storage

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, davinder singh <davindersingh2692(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize external TOAST storage
Date: 2022-03-23 11:10:37
Message-ID: CAFiTN-s5k=c7Ttm69PZmQHtCGwHMMj=xjHOpjP1Fusuq-Rjf5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 18, 2022 at 1:35 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
>
> I guess I think we should be slightly more ambitious. One idea could be to
> create a default_toast_compression_ratio GUC with a default of 0.95. This
> means that, by default, a compressed attribute must be 0.95x or less of the
> size of the uncompressed attribute to be stored compressed. Like
> default_toast_compression, this could also be overridden at the column
> level with something like

I am not sure that we want a GUC to control that but we can certainly
be more ambitious. Basically, in the current patch if data is
slightly large then we would always prefer to store the compressed
data, e.g. if the data size is 200kB then even if the compression
ratio is as low as 1% then we would choose to store then compressed
data.

I think we can make it based on the compression ratio and then upper
bound it with the number of chunk differences. For example if the
compression ratio < 10% then stored it uncompressed iff the chunk
difference < threshold. But with that we might see performance impact
on the smaller data which has a compressed ratio < 10% because their
chunk difference will always be under the threshold. So maybe the
chunk difference threshold can be a function based on the total
numbers of chunks required for the data, maybe a logarithmic function
so that the threshold grows slowly along with the base data size.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message RKN Sai Krishna 2022-03-23 11:43:47 [Proposal] pg_rewind integration into core
Previous Message Alvaro Herrera 2022-03-23 10:21:23 Re: Column Filtering in Logical Replication