From: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Subject: | [PATCH] Compression dictionaries for JSONB |
Date: | 2022-04-22 08:30:01 |
Message-ID: | CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22=5xVBg7S4vr5rQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
This is a follow-up thread to `RFC: compression dictionaries for JSONB`
[1]. I would like to share my current progress in order to get early
feedback. The patch is currently in a draft state but implements the basic
functionality. I did my best to account for all the great feedback I
previously got from Alvaro and Matthias.
Usage example:
```
CREATE TYPE mydict AS DICTIONARY OF jsonb ('aaa', 'bbb');
SELECT '{"aaa":"bbb"}' :: mydict;
mydict
----------------
{"aaa": "bbb"}
SELECT ('{"aaa":"bbb"}' :: mydict) -> 'aaa';
?column?
----------
"bbb"
```
Here `mydict` works as a transparent replacement for `jsonb`. However, its
internal representation differs. The provided dictionary entries ('aaa',
'bbb') are stored in the new catalog table:
```
SELECT * FROM pg_dict;
oid | dicttypid | dictentry
-------+-----------+-----------
39476 | 39475 | aaa
39477 | 39475 | bbb
(2 rows)
```
When `mydict` sees 'aaa' in the document, it replaces it with the
corresponding code, in this case - 39476. For more details regarding the
compression algorithm and choosen compromises please see the comments in
the patch.
In pg_type `mydict` has typtype = TYPTYPE_DICT. It works the same way as
TYPTYPE_BASE with only difference: corresponding `<type>_in`
(pg_type.typinput) and `<another-type>_<type>` (pg_cast.castfunc)
procedures receive the dictionary Oid as a `typmod` argument. This way the
procedures can distinguish `mydict1` from `mydict2` and use the proper
compression dictionary.
The approach with alternative `typmod` role is arguably a bit hacky, but it
was the less invasive way to implement the feature I've found. I'm open to
alternative suggestions.
Current limitations (todo):
- ALTER TYPE is not implemented
- Tests and documentation are missing
- Autocomplete is missing
Future work (out of scope of this patch):
- Support types other than JSONB: TEXT, XML, etc
- Automatically updated dictionaries, e.g. during VACUUM
- Alternative compression algorithms. Note that this will not require any
further changes in the catalog, only the values we write to pg_type and
pg_cast will differ.
Open questions:
- Dictionary entries are currently stored as NameData, the same type that
is used for enums. Are we OK with the accompanying limitations? Any
alternative suggestions?
- All in all, am I moving the right direction?
Your feedback is very much welcomed!
[1]:
https://postgr.es/m/CAJ7c6TPx7N-bVw0dZ1ASCDQKZJHhBYkT6w4HV1LzfS%2BUUTUfmA%40mail.gmail.com
--
Best regards,
Aleksander Alekseev
Attachment | Content-Type | Size |
---|---|---|
v1-0001-CREATE-TYPE-foo-AS-DICTIONARY-OF-JSONB.patch | application/octet-stream | 45.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2022-04-22 08:33:56 | Re: RFC: compression dictionaries for JSONB |
Previous Message | Magnus Hagander | 2022-04-22 07:49:34 | Re: Estimating HugePages Requirements? |