From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Custom compression methods |
Date: | 2017-11-21 17:47:49 |
Message-ID: | bfb3d34e-ba74-02cb-ca2a-43344866d9fd@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 11/21/2017 03:47 PM, Ildus Kurbangaliev wrote:
> On Mon, 20 Nov 2017 00:04:53 +0100
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> ...
>
>> 6) I'm rather confused by AttributeCompression vs.
>> ColumnCompression. I mean, attribute==column, right? Of course, one
>> is for data from parser, the other one is for internal info. But
>> can we make the naming clearer?
>
> For now I have renamed AttributeCompression to CompressionOptions,
> not sure that's a good name but at least it gives less confusion.
>
I propose to use either
CompressionMethodOptions (and CompressionMethodRoutine)
or
CompressionOptions (and CompressionRoutine)
>>
>> 7) The docs in general are somewhat unsatisfactory, TBH. For example
>> the ColumnCompression has no comments, unlike everything else in
>> parsenodes. Similarly for the SGML docs - I suggest to expand them to
>> resemble FDW docs
>> (https://www.postgresql.org/docs/10/static/fdwhandler.html) which
>> also follows the handler/routines pattern.
>
> I've added more comments. I think I'll add more documentation if the
> committers will approve current syntax.
>
OK. Haven't reviewed this yet.
>>
>> 8) One of the unclear things if why we even need 'drop' routing. It
>> seems that if it's defined DropAttributeCompression does something.
>> But what should it do? I suppose dropping the options should be done
>> using dependencies (just like we drop columns in this case).
>>
>> BTW why does DropAttributeCompression mess with att->attisdropped in
>> this way? That seems a bit odd.
>
> 'drop' routine could be useful. An extension could do something
> related with the attribute, like remove extra tables or something
> else. The compression options will not be removed after unlinking
> compression method from a column because there is still be stored
> compressed data in that column.
>
OK. So something like a "global" dictionary used for the column, or
something like that? Sure, seems useful and I've been thinking about
that, but I think we badly need some extension using that, even if in a
very simple way. Firstly, we need a "how to" example, secondly we need
some way to test it.
>>
>> 13) When writing the experimental extension, I was extremely
>> confused about the regular varlena headers, custom compression
>> headers, etc. In the end I stole the code from tsvector.c and
>> whacked it a bit until it worked, but I wouldn't dare to claim I
>> understand how it works.
>>
>> This needs to be documented somewhere. For example postgres.h has
>> a bunch of paragraphs about varlena headers, so perhaps it should
>> be there? I see the patch tweaks some of the constants, but does
>> not update the comment at all.
>
> This point is good, I'm not sure how this documentation should look
> like. I've just assumed that people should have deep undestanding of
> varlenas if they're going to compress them. But now it's easy to
> make mistake there. Maybe I should add some functions that help to
> construct varlena, with different headers. I like the way is how
> jsonb is constructed. It uses StringInfo and there are few helper
> functions (reserveFromBuffer, appendToBuffer and others). Maybe they
> should be not static.
>
Not sure. My main problem was not understanding how this affects the
varlena header, etc. And I had no idea where to look.
>>
>> Perhaps it would be useful to provide some additional macros
>> making access to custom-compressed varlena values easier. Or
>> perhaps the VARSIZE_ANY / VARSIZE_ANY_EXHDR / VARDATA_ANY already
>> support that? This part is not very clear to me.
>
> These macros will work, custom compressed varlenas behave like old
> compressed varlenas.
>
OK. But then I don't understand why tsvector.c does things like
VARSIZE(data) - VARHDRSZ_CUSTOM_COMPRESSED - arrsize
VARRAWSIZE_4B_C(data) - arrsize
instead of
VARSIZE_ANY_EXHDR(data) - arrsize
VARSIZE_ANY(data) - arrsize
Seems somewhat confusing.
>>> Still it's a problem if the user used for example `SELECT
>>> <compressed_column> INTO * FROM *` because postgres will copy
>>> compressed tuples, and there will not be any dependencies
>>> between destination and the options.
>>>
>>
>> This seems like a rather fatal design flaw, though. I'd say we need
>> to force recompression of the data, in such cases. Otherwise all
>> the dependency tracking is rather pointless.
>
> Fixed this problem too. I've added recompression for datum that use
> custom compression.
>
Hmmm, it still doesn't work for me. See this:
test=# create extension pg_lz4 ;
CREATE EXTENSION
test=# create table t_lz4 (v text compressed lz4);
CREATE TABLE
test=# create table t_pglz (v text);
CREATE TABLE
test=# insert into t_lz4 select repeat(md5(1::text),300);
INSERT 0 1
test=# insert into t_pglz select * from t_lz4;
INSERT 0 1
test=# drop extension pg_lz4 cascade;
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to compression options for lz4
drop cascades to table t_lz4 column v
DROP EXTENSION
test=# \c test
You are now connected to database "test" as user "user".
test=# insert into t_lz4 select repeat(md5(1::text),300);^C
test=# select * from t_pglz ;
ERROR: cache lookup failed for compression options 16419
That suggests no recompression happened.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2017-11-21 17:54:54 | Re: View with duplicate GROUP BY entries |
Previous Message | Stephen Frost | 2017-11-21 17:28:03 | Re: to_typemod(type_name) information function |