From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] Custom compression methods |
Date: | 2021-03-18 11:10:41 |
Message-ID: | CAFiTN-sBH65e4B5kQB5MT8BQOPajkrKvSdKA-07+ra3f2x4vCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 18, 2021 at 1:32 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> ).On Mon, Mar 15, 2021 at 6:58 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > - Adding all these indirect function calls via toast_compression[] just
> > for all of two builtin methods isn't fun either.
>
> Yeah, it feels like this has too many layers of indirection now. Like,
> toast_decompress_datum() first gets TOAST_COMPRESS_METHOD(attr). Then
> it calls CompressionIdToMethod to convert one constant (like
> TOAST_PGLZ_COMPRESSION_ID) to another constant with a slightly
> different name (like TOAST_PGLZ_COMPRESSION). Then it calls
> GetCompressionRoutines() to get hold of the function pointers. Then it
> does an indirect functional call. That seemed like a pretty reasonable
> idea when we were trying to support arbitrary compression AMs without
> overly privileging the stuff that was built into core, but if we're
> just doing stuff that's built into core, then we could just switch
> (TOAST_COMPRESS_METHOD(attr)) and call the correct function. In fact,
> we could even move the stuff from toast_compression.c into detoast.c,
> which would allow the compiler to optimize better (e.g. by inlining,
> if it wants).
>
> The same applies to toast_decompress_datum_slice().
Changed this, but I have still kept the functions in
toast_compression.c. I think keeping compression related
functionality in a separate file looks much cleaner. Please have a
look and let me know that whether you still feel we should move it ti
detoast.c. If the reason is that we can inline, then I feel we are
already paying cost of compression/decompression and compare to that
in lining a function will not make much difference.
> There's a similar issue in toast_get_compression_method() and the only
> caller, pg_column_compression(). Here the multiple mapping layers and
> the indirect function call are split across those two functions rather
> than all in the same one, but here again one could presumably find a
> place to just switch on TOAST_COMPRESS_METHOD(attr) or
> VARATT_EXTERNAL_GET_COMPRESSION(attr) and return "pglz" or "lz4"
> directly.
I have simplified that, only one level of function call from
pg_column_compression, I have kept a toast_get_compression_id
function because in later patch 0005, we will be using that for
getting the compression id from the compressed data.
> In toast_compress_datum(), I think we could have a switch that invokes
> the appropriate compressor based on cmethod and sets a variable to the
> value to be passed as the final argument of
> TOAST_COMPRESS_SET_SIZE_AND_METHOD().
Done
> Likewise, I suppose CompressionNameToMethod could at least be
> simplified to use constant strings rather than stuff like
> toast_compression[TOAST_PGLZ_COMPRESSION_ID].cmname.
Done
Other changes:
- As suggested by Andres, remove compression method comparision from
eualTupleDesc, because it is not required now.
- I found one problem in existing patch, the problem was in
detoast_attr_slice, if externally stored data is compressed then we
compute max possible compressed size to fetch based on the slice
length, for that we were using pglz_maximum_compressed_size, which is
not correct for lz4. For lz4, I think we need to fetch the complete
compressed data. We might think that for lz4 we might compute lie
Min(LZ4_compressBound(slicelength, total_compressed_size); But IMHO,
we can not do that and the reason is same that why we should not use
PGLZ_MAX_OUTPUT for pglz (explained in the comment atop
pglz_maximum_compressed_size).
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v37-0001-Get-datum-from-tuple-which-doesn-t-contain-exter.patch | text/x-patch | 37.4 KB |
v37-0003-Built-in-compression-method.patch | text/x-patch | 104.5 KB |
v37-0004-Add-default_toast_compression-GUC.patch | text/x-patch | 11.8 KB |
v37-0005-Alter-table-set-compression.patch | text/x-patch | 21.5 KB |
v37-0002-Expand-the-external-data-before-forming-the-tupl.patch | text/x-patch | 8.2 KB |
v37-0006-default-to-with-lz4.patch | text/x-patch | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-03-18 11:11:41 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | Thomas Munro | 2021-03-18 11:05:11 | Re: fdatasync performance problem with large number of DB files |