Quick Links

Re: ZStandard (with dictionaries) compression support for TOAST compression

From:	Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: ZStandard (with dictionaries) compression support for TOAST compression
Date:	2025-03-06 20:59:01
Message-ID:	CAFAfj_GACKVftwuRjy3Ls-1Xc3ojUUbVh=Rm7KpRuYbaS=uLPg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

With the latest patch I've shared,

Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged
PostgreSQL's acquire_sample_rows function to quickly gather just 1,000
sample rows for a specific attribute out of 104695 rows. These raw
samples were passed into Zstd's sampling buffer, generating a custom
dictionary. This dictionary was then directly used to compress the
documents, resulting in 62% of space savings after compressed:

We've observed similarly strong results on other datasets as well with
using dictionaries.

[1] https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data

---
Nikhil Veldanda

In response to

Re: ZStandard (with dictionaries) compression support for TOAST compression at 2025-03-06 19:15:03 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2025-03-06 21:02:41	Re: what's going on with lapwing?
Previous Message	Jacob Champion	2025-03-06 20:57:24	Re: [PoC] Federated Authn/z with OAUTHBEARER