From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [HACKERS] Custom compression methods |
Date: | 2020-10-04 22:07:13 |
Message-ID: | 20201004220713.6vlmm2e3amlz2dil@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I took a look at this patch after a long time, and done a bit of a
review+testing. I haven't re-read the whole thread since 2017 so some of
the following comments might be mistaken - sorry about that :-(
1) The "cmapi.h" naming seems unnecessarily short. I'd suggest using
simply compression or something like that. I see little reason to
shorten "compression" to "cm", or to prefix files with "cm_". For
example compression/cm_zlib.c might just be compression/zlib.c.
2) I see index_form_tuple does this:
Datum cvalue = toast_compress_datum(untoasted_values[i],
DefaultCompressionMethod);
which seems wrong - why shouldn't the indexes use the same compression
method as the underlying table?
3) dumpTableSchema in pg_dump.c does this:
switch (tbinfo->attcompression[j])
{
case 'p':
cmname = "pglz";
case 'z':
cmname = "zlib";
}
which is broken as it's missing break, so 'p' will produce 'zlib'.
4) The name ExecCompareCompressionMethod is somewhat misleading, as the
functions is not merely comparing compression methods - it also
recompresses the data.
5) CheckCompressionMethodsPreserved should document what the return
value is (true when new list contains all old values, thus not requiring
a rewrite). Maybe "Compare" would be a better name?
6) The new field in ColumnDef is missing a comment.
7) It's not clear to me what "partial list" in the PRESERVE docs means.
+ which of them should be kept on the column. Without PRESERVE or partial
+ list of compression methods the table will be rewritten.
8) The initial synopsis in alter_table.sgml includes the PRESERVE
syntax, but then later in the page it's omitted (yet the section talks
about the keyword).
9) attcompression ...
The main issue I see is what the patch does with attcompression. Instead
of just using it to store a the compression method, it's also used to
store the preserved compression methods. And using NameData to store
this seems wrong too - if we really want to store this info, the correct
way is either using text[] or inventing charvector or similar.
But to me this seems very much like a misuse of attcompression to track
dependencies on compression methods, necessary because we don't have a
separate catalog listing compression methods. If we had that, I think we
could simply add dependencies between attributes and that catalog.
Moreover, having the catalog would allow adding compression methods
(from extensions etc) instead of just having a list of hard-coded
compression methods. Which seems like a strange limitation, considering
this thread is called "custom compression methods".
10) compression parameters?
I wonder if we could/should allow parameters, like compression level
(and maybe other stuff, depending on the compression method). PG13
allowed that for opclasses, so perhaps we should allow it here too.
11) pg_column_compression
When specifying compression method not present in attcompression, we get
this error message and hint:
test=# alter table t alter COLUMN a set compression "pglz" preserve (zlib);
ERROR: "zlib" compression access method cannot be preserved
HINT: use "pg_column_compression" function for list of compression methods
but there is no pg_column_compression function, so the hint is wrong.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-10-05 00:48:21 | Re: Buggy handling of redundant options in COPY |
Previous Message | Thomas Munro | 2020-10-04 21:20:01 | Re: A modest proposal: let's add PID to assertion failure messages |