Re: ZStandard (with dictionaries) compression support for TOAST compression

From: Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ZStandard (with dictionaries) compression support for TOAST compression
Date: 2025-04-15 18:13:29
Message-ID: CAFAfj_Gb-1HxZ303MkX5Z8skzc6DAg0Ygc_nRqop3LB-PB2bnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

Thank you for your response, and apologies for the delay in getting
back to you. You raised some important concerns in your reply, I’ve
worked hard to understand and hopefully address these two:

* Dictionary Cleanup via Dependency Tracking
* Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ...
SELECT ...)

Dictionary Cleanup via Dependency Tracking:

To address your question on how we can safely clean up unused
dictionaries, I’ve implemented a mechanism based on PostgreSQL’s
standard dependency system (pg_depend), permit me to explain.

When a Zstandard dictionary is created for a table, we record a
DEPENDENCY_NORMAL dependency from the table to the dictionary. This
ensures that when the table is dropped, the corresponding entry is
removed from the pg_depend catalog. Users can then call the
cleanup_unused_dictionaries() function to remove any dictionaries that
are no longer referenced by any table.

// create dependency,
{
ObjectAddress dictObj;
ObjectAddress relation;

ObjectAddressSet(dictObj, ZstdDictionariesRelationId, dictid);
ObjectAddressSet(relation, RelationRelationId, relid);

/* NORMAL dependency: relid → Dictionary */
recordDependencyOn(&relation, &dictObj, DEPENDENCY_NORMAL);
}

Example: Consider two tables, each using its own Zstandard dictionary:

test=# \dt+
List of tables
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
public | temp | table | nikhilkv | permanent | heap | 16 kB |
public | temp1 | table | nikhilkv | permanent | heap | 16 kB |
(2 rows)

// Dictionary dependencies
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16389 | 0 | 9946 | 1 | 0 | n
1259 | 16394 | 0 | 9946 | 2 | 0 | n
(2 rows)

// the corresponding dictionaries:
test=# select * from pg_zstd_dictionaries ;
dictid |
dict
--------+----------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
--------------------------------------
1 | \x37a430ec71451a10091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414
14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01
0100000004000000080000004c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696
e6720656c69742e204c6f72656d2069
2 | \x37a430ec7d1a933a091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414
14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01
0100000004000000080000004e696b68696c206b756d616e722076656c64616e64612c206973206f6b61792063616e6469646174652c2068652069732
0696e2073656174746c65204e696b68696c20
(2 rows)

If cleanup_unused_dictionaries() is called while the dependencies
still exist, nothing is removed:

test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

After dropping temp1, the associated dictionary becomes eligible for cleanup:

test=# drop table temp1;
DROP TABLE

test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
1
(1 row)

________________________________
Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

This guarantees that if compressed datums reference a zstd dictionary
the destination table is marked as dependent on the dictionaries that
the source tables depend on, preventing premature cleanup by
cleanup_unused_dictionaries.

Example: Consider this example where we have two tables which has
their own dictionary

List of tables
Schema | Name | Type | Owner | Persistence | Access method |
Size | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
public | temp | table | nikhilkv | permanent | heap | 16 kB |
public | temp1 | table | nikhilkv | permanent | heap | 16 kB |
(2 rows)

Using CTAS (CREATE TABLE AS), one table is copied to another. In this
case, the compressed datums in the temp table are copied to copy_tbl.
Since the dictionary is shared between two tables, a dependency on
that dictionary is also established for the destination table. Even if
the original temp table is deleted and cleanup is triggered, the
dictionary will not be dropped because there remains an active
dependency.

test=# create table copy_tbl as select * from temp;
SELECT 20

// dictid 1 is shared between two tables.
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16389 | 0 | 9946 | 1 | 0 | n
1259 | 16404 | 0 | 9946 | 1 | 0 | n
1259 | 16399 | 0 | 9946 | 3 | 0 | n
(3 rows)

// After dropping the temp tale where dictid 1 is used to compress datums
test=# drop table temp;
DROP TABLE

// dependency for temp table is dropped.
test=# select * from pg_depend where refclassid = 9946;
classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
1259 | 16404 | 0 | 9946 | 1 | 0 | n
1259 | 16399 | 0 | 9946 | 3 | 0 | n
(2 rows)

// No dictionaries are being deleted.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

Once the new copy_tbl is also deleted, the dictionary can be dropped
because no dependency exists on it:

test=# drop table copy_tbl;
DROP TABLE

// The dictionary is then deleted.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
1
(1 row)

Another example using composite types, including a more complex
scenario involving two source tables.

// Create a base composite type with two text fields
test=# create type my_composite as (f1 text, f2 text);
CREATE TYPE

// Create a nested composite type that uses my_composite twice
test=# create type my_composite1 as (f1 my_composite, f2 my_composite);
CREATE TYPE

test=# \d my_composite
Composite type "public.my_composite"
Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
f1 | text | | |
f2 | text | | |

test=# \d my_composite1
Composite type "public.my_composite1"
Column | Type | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
f1 | my_composite | | |
f2 | my_composite | | |

// Sample table with ZSTD dictionary compression on text columns
test=# \d+ orders
Table "public.orders"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
order_id | integer | | | | plain |
| |
customer_id | integer | | | | plain |
| |
random1 | text | | | | extended |
zstd | |
random2 | text | | | | extended |
zstd | |
Access method: heap

// Sample table with ZSTD dictionary compression on one of the text column
test=# \d+ customers
Table "public.customers"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
customer_id | integer | | | | plain |
| |
random3 | text | | | | extended |
zstd | |
random4 | text | | | | extended |
| |
Access method: heap

// Check existing dictionaries: dictid 1 for random1, dictid 2 for
random2, dictid 3 for random3 attribute
test=# select dictid from pg_zstd_dictionaries;
dictid
--------
1
2
3
(3 rows)

// List all objects dependent on ZSTD dictionaries
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
orders | 1259 | 16391 | 0 | 9946 | 1 |
0 | n
orders | 1259 | 16391 | 0 | 9946 | 2 |
0 | n
customers | 1259 | 16396 | 0 | 9946 | 3 |
0 | n
(3 rows)

// Create new table using nested composite type
// This copies compressed datums into temp1.
test=# create table temp1 as
select ROW(
ROW(random3, random4)::my_composite,
ROW(random1, random2)::my_composite
)::my_composite1
from customers full outer join orders using (customer_id);
SELECT 51

test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
orders | 1259 | 16391 | 0 | 9946 | 1 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 1 |
0 | n
orders | 1259 | 16391 | 0 | 9946 | 2 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 2 |
0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 3 |
0 | n
customers | 1259 | 16396 | 0 | 9946 | 3 |
0 | n
(6 rows)

// Drop the original source tables.
test=# drop table orders;
DROP TABLE

test=# drop table customers ;
DROP TABLE

// Even after dropping orders, customers table, temp1 still holds
references to the dictionaries.
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
temp1 | 1259 | 16423 | 0 | 9946 | 1 | 0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 2 | 0 | n
temp1 | 1259 | 16423 | 0 | 9946 | 3 | 0 | n
(3 rows)

// Attempt cleanup, No cleanup occurs, because temp1 table still
depends on the dictionaries.
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
0
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
dictid
--------
1
2
3
(3 rows)

// Drop the destination table
test=# drop table temp1;
DROP TABLE

// Confirm no remaining dependencies
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
(0 rows)

// Cleanup now succeeds
test=# select cleanup_unused_dictionaries();
cleanup_unused_dictionaries
-----------------------------
3
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
dictid
--------
(0 rows)

This design ensures that:

Dictionaries are only deleted when no table depends on them.
We avoid costly decompression/recompression to avoid compressed datum leakage.
We don’t retain dictionaries forever.

These changes are the core additions in this revision of the patch to
address concern around long-lived dictionaries and compressed datum
leakage. Additionally, this update incorporates feedback by enabling
automatic zstd dictionary generation and cleanup during the VACUUM
process and includes changes to support copying ZSTD dictionaries
during pg_upgrade.

Patch summary:

v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
Refactors varattrib_4b structures and updates related macros to enable
ZSTD dictionary support.
v11-0002-Zstd-compression-and-decompression-routines-incl.patch
Adds ZSTD compression and decompression routines, and introduces a new
catalog to store dictionary metadata.
v11-0003-Zstd-dictionary-training-process.patch
Implements the dictionary training workflow. Includes built-in support
for text and jsonb types. Allows users to define custom sampling
functions per type by specifying a C function name in the
pg_type.typzstdsampling field.
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
Introduces a dependency tracking mechanism using pg_depend to record
which ZSTD dictionaries a table depends on. When compressed datums
that rely on a dictionary are copied to unrelated target tables, the
corresponding dictionary dependencies from the source table are also
recorded for the target table, ensuring the dictionaries are not
prematurely cleaned up.
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
Adds integration with VACUUM to automatically generate and clean up
ZSTD dictionaries.
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
Extends pg_dump and pg_upgrade to support migrating ZSTD dictionaries
and their dependencies during pg_upgrade.
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
Provides test coverage for ZSTD dictionary-based compression features,
including training, usage, and cleanup.

I hope that these changes address your concerns, any thoughts or
suggestions on this approach are welcome.

Best regards,
Nikhil Veldanda

On Mon, Mar 17, 2025 at 1:03 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
> <veldanda(dot)nikhilkumar17(at)gmail(dot)com> wrote:
> > struct /* Extended compression format */
> > {
> > uint32 va_header;
> > uint32 va_tcinfo;
> > uint32 va_cmp_alg;
> > uint32 va_cmp_dictid;
> > char va_data[FLEXIBLE_ARRAY_MEMBER];
> > } va_compressed_ext;
> > } varattrib_4b;
>
> First, thanks for sending along the performance results. I agree that
> those are promising. Second, thanks for sending these design details.
>
> The idea of keeping dictionaries in pg_zstd_dictionaries literally
> forever doesn't seem very appealing, but I'm not sure what the other
> options are. I think we've established in previous work in this area
> that compressed values can creep into unrelated tables and inside
> records or other container types like ranges. Therefore, we have no
> good way of knowing when a dictionary is unreferenced and can be
> dropped. So in that sense your decision to keep them forever is
> "right," but it's still unpleasant. It would even be necessary to make
> pg_upgrade carry them over to new versions.
>
> If we could make sure that compressed datums never leaked out into
> other tables, then tables could depend on dictionaries and
> dictionaries could be dropped when there were no longer any tables
> depending on them. But like I say, previous work suggested that this
> would be very difficult to achieve. However, without that, I imagine
> users generating new dictionaries regularly as the data changes and
> eventually getting frustrated that they can't get rid of the old ones.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch application/octet-stream 3.7 KB
v11-0003-Zstd-dictionary-training-process.patch application/octet-stream 41.5 KB
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch application/octet-stream 71.9 KB
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch application/octet-stream 13.5 KB
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch application/octet-stream 11.5 KB
v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch application/octet-stream 4.8 KB
v11-0002-Zstd-compression-and-decompression-routines-incl.patch application/octet-stream 51.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mahendra Singh Thalor 2025-04-15 18:30:25 Re: Non-text mode for pg_dumpall
Previous Message Jacob Champion 2025-04-15 18:02:13 Re: [PoC] Federated Authn/z with OAUTHBEARER