Re: Add ZSON extension to /contrib/

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add ZSON extension to /contrib/
Date: 2021-05-25 20:08:24
Message-ID: 6f3944ad-6924-5fed-580c-e72477733f04@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 5/25/21 6:55 AM, Aleksander Alekseev wrote:
> Hi hackers,
>
> Back in 2016 while being at PostgresPro I developed the ZSON extension
> [1]. The extension introduces the new ZSON type, which is 100%
> compatible with JSONB but uses a shared dictionary of strings most
> frequently used in given JSONB documents for compression. These
> strings are replaced with integer IDs. Afterward, PGLZ (and now LZ4)
> applies if the document is large enough by common PostgreSQL logic.
> Under certain conditions (many large documents), this saves disk
> space, memory and increases the overall performance. More details can
> be found in README on GitHub.
>
> The extension was accepted warmly and instantaneously I got several
> requests to submit it to /contrib/ so people using Amazon RDS and
> similar services could enjoy it too. Back then I was not sure if the
> extension is mature enough and if it lacks any additional features
> required to solve the real-world problems of the users. Time showed,
> however, that people are happy with the extension as it is. There were
> several minor issues discovered, but they were fixed back in 2017. The
> extension never experienced any compatibility problems with the next
> major release of PostgreSQL.
>
> So my question is if the community may consider adding ZSON to
> /contrib/. If this is the case I will add this thread to the nearest
> CF and submit a corresponding patch.
>
> [1]: https://github.com/postgrespro/zson
> <https://github.com/postgrespro/zson>
>
We (2ndQuadrant, now part of EDB) made some enhancements to Zson a few years ago, and I have permission to contribute those if this proposal is adopted. From the readme:

1. There is an option to make zson_learn only process object keys,
rather than field values.

```
select zson_learn('{{table1,col1}}',true);
```

2. Strings with an octet-length less than 3 are not processed.
Since strings are encoded as 2 bytes and then there needs to be
another byte with the length of the following skipped bytes, encoding
values less than 3 bytes is going to be a net loss.

3. There is a new function to create a dictionary directly from an
array of text, rather than using the learning code:

```
select zson_create_dictionary(array['word1','word2']::text[]);
```

4. There is a function to augment the current dictionary from an array of text:

```
select zson_extend_dictionary(array['value1','value2','value3']::text[]);
```

This is particularly useful for adding common field prefixes or values. A good
example of field prefixes is URL values where the first part of the URL is
fairly constrained but the last part is not.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-25 20:10:29 Re: Add ZSON extension to /contrib/
Previous Message Robert Haas 2021-05-25 19:53:36 Re: CALL versus procedures with output-only arguments