Re: [HACKERS] Custom compression methods

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] Custom compression methods
Date: 2021-03-15 22:58:35
Message-ID: 20210315225835.2toe2ceanp7gi6fw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-03-15 15:29:05 -0400, Robert Haas wrote:
> On Mon, Mar 15, 2021 at 8:14 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > In the attached patches I have changed this, ...
>
> OK, so just looking over this patch series, here's what I think:
>
> - 0001 and 0002 are now somewhat independent of the rest of this work,
> and could be dropped, but I think they're a good idea, so I'd like to
> commit them. I went over 0001 carefully this morning and didn't find
> any problems. I still need to do some more review of 0002.

I don't particularly like PG_RETURN_HEAPTUPLEHEADER_RAW(). What is "raw"
about it? It also seems to me like there needs to at least be a
sentence or two explaining when to use which of the functions.

I think heap_copy_tuple_as_raw_datum() should grow an assert checking
there are no external columns?

The commit messages could use a bit more explanation about motivation.

I'm don't like that after 0002 ExecEvalRow(), ExecEvalFieldStoreForm()
contain a nearly identical copy of the same code. And
make_tuple_from_row() also is similar. It seem that there should be a
heap_form_tuple() version doing this for us?

> - 0003 through 0005 are the core of this patch set. I'd like to get
> them into this release, but I think we're likely to run out of time.

Comments about 0003:
- why is HIDE_TOAST_COMPRESSION useful? Doesn't quite seem to be
comparable to HIDE_TABLEAM?

- (you comment on this later): toast_get_compression_method() needing to
fetch some of the data to figure out the compression method is pretty
painful. Especially because it then goes and throws away that data!

- Adding all these indirect function calls via toast_compression[] just
for all of two builtin methods isn't fun either.

- I guess NO_LZ4_SUPPORT() is a macro so it shows the proper
file/function name?

- I wonder if adding compression to the equalTupleDesc() is really
necessary / won't cause problems (thinking of cases like the
equalTupleDesc() call in pg_proc.c).

- Is nodeModifyTable.c really the right place for the logic around
CompareCompressionMethodAndDecompress()? And is doing it in every
place that does "user initiated" inserts really the right way? Why
isn't this done on the tuptoasting level?

- CompareCompressionMethodAndDecompress() is pretty deeply
indented. Perhaps rewrite a few more of the conditions to be
continue;?

Comments about 0005:
- I'm personally not really convinced tracking the compression type in
pg_attribute the way you do is really worth it (. Especially given
that it's right now only about new rows anyway. Seems like it'd be
easier to just treat it as a default for new rows, and dispense with
all the logic around mismatching compression types etc?

> The biggest thing that jumps out at me while looking at this with
> fresh eyes is that the patch doesn't touch varatt_external.va_extsize
> at all. In a varatt_external, we can't use the va_rawsize to indicate
> the compression method, because there are no bits free, because the 2
> bits not required to store the size are used to indicate what type of
> varlena we've got.

Once you get to varatt_external, you could also just encode it via
vartag_external...

> But, that means that the size of a varlena is limited to 1GB, so there
> are 2 bits free in varatt_external.va_extsize, just like there are in
> va_compressed.va_rawsize. We could store the same two bits in
> varatt_external.va_extsize that we're storing in
> va_compressed.va_rawsize aka va_tcinfo. That's a big deal, because
> then toast_get_compression_method() doesn't have to call
> toast_fetch_datum_slice() any more, which is a rather large savings.
> If it's only impacting pg_column_compression() then whatever, but
> that's not the case: we've got calls to
> CompareCompressionMethodAndDecompress in places like intorel_receive()
> and ExecModifyTable() that look pretty performance-critical.

Yea, I agree, that does seem problematic.

> There's another, rather brute-force approach to this problem, too. We
> could just decide that lz4 will only be used for external data, and
> that there's no such thing as an inline-compressed lz4 varlena.
> deotast_fetch_datum() would just notice that the value is lz4'd and
> de-lz4 it before returning it, since a compressed lz4 datum is
> impossible.

That seems fairly terrible.

> I'm open to being convinced that we don't need to do either of these
> things, and that the cost of iterating over all varlenas in the tuple
> is not so bad as to preclude doing things as you have them here. But,
> I'm afraid it's going to be too expensive.

I mean, I would just define several of those places away by not caring
about tuples in a different compressino formation ending up in a
table...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-03-15 23:04:37 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
Previous Message Justin Pryzby 2021-03-15 22:49:45 Re: [PATCH]: Allow errors in parameter values to be reported during the BIND phase itself..