From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Custom compression methods |
Date: | 2017-12-14 17:23:30 |
Message-ID: | f86f1a2f-9510-e712-2a53-bfc17f6d2414@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/14/2017 04:21 PM, Robert Haas wrote:
> On Wed, Dec 13, 2017 at 5:10 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> 2. If several data types can benefit from a similar approach, it has
>>> to be separately implemented for each one.
>>
>> I don't think the current solution improves that, though. If you
>> want to exploit internal features of individual data types, it
>> pretty much requires code customized to every such data type.
>>
>> For example you can't take the tsvector compression and just slap
>> it on tsquery, because it relies on knowledge of internal tsvector
>> structure. So you need separate implementations anyway.
>
> I don't think that's necessarily true. Certainly, it's true that
> *if* tsvector compression depends on knowledge of internal tsvector
> structure, *then* that you can't use the implementation for anything
> else (this, by the way, means that there needs to be some way for a
> compression method to reject being applied to a column of a data
> type it doesn't like).
I believe such dependency (on implementation details) is pretty much the
main benefit of datatype-aware compression methods. If you don't rely on
such assumption, then I'd say it's a general-purpose compression method.
> However, it seems possible to imagine compression algorithms that can
> work for a variety of data types, too. There might be a compression
> algorithm that is theoretically a general-purpose algorithm but has
> features which are particularly well-suited to, say, JSON or XML
> data, because it looks for word boundaries to decide on what strings
> to insert into the compression dictionary.
>
Can you give an example of such algorithm? Because I haven't seen such
example, and I find arguments based on hypothetical compression methods
somewhat suspicious.
FWIW I'm not against considering such compression methods, but OTOH it
may not be such a great primary use case to drive the overall design.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2017-12-14 17:33:22 | Re: procedures and plpgsql PERFORM |
Previous Message | Pavel Stehule | 2017-12-14 16:46:47 | Re: procedures and plpgsql PERFORM |