Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-23 23:46:30
Message-ID: CAHut+Pt9ArtYc1Vtb1DqzEEFwjcoDHMyRCnUqkHQeFJuMCDkvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi. Here are my review comments for v32-0001

You wrote: "I have addressed all the comments in the v32-0001 Patch.",
however, I found multiple old review comments not addressed. Please
give a reason if a comment is deliberately left out, otherwise, I will
assume they are omitted by accident and so keep repeating them.

There were also still some unanswered questions from previous reviews,
so I have reminded you about those again here.

======
Commit message

1.
This commit enables support for the 'publish_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes. The option
'publish_generated_columns' is a PUBLICATION parameter.

~

That PUBLICATION info in the 2nd sentence would be easier to say in
the 1st sentence.
SUGGESTION:
This commit supports the transmission of generated column information
and data alongside regular table changes. This behaviour is controlled
by a new PUBLICATION parameter ('publish_generated_columns').

~~~

2.
When 'publish_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

Hm. This contradicts the behaviour that Amit wanted, (e.g.
"column-list takes precedence"). So I am not sure if this patch is
already catering for the behaviour suggested by Amit or if that is yet
to come in v33. For now, I am assuming that 32* has not caught up with
the latest behaviour requirements, but that might be a wrong
assumption; perhaps it is only this commit message that is bogus.

~~~

3. General.

On the same subject, there is lots of code, like:

if (att->attgenerated && !pub->pubgencols)
continue;

I suspect that might not be quite what you want for the "column-list
takes precedence" behaviour, but I am not going to identify all those
during this review. It needs lots of combinations of column list tests
to verify it.

======
doc/src/sgml/ddl.sgml

4ab.
nit - Huh?? Not changed the linkend as told in a previous review [1-#3a]
nit - Huh?? Not changed to call this a "parameter" instead of an
"option" as told in a previous review [1-#3b]

======
doc/src/sgml/protocol.sgml

5.
- <para>
- Next, the following message part appears for each column included in
- the publication (except generated columns):
- </para>
-

nit -- Huh?? I don't think you can just remove this whole paragraph.
But, probably you can just remove the "except generated columns" part.
I posted this same comment [4 #11] 20 patch versions back.

======
doc/src/sgml/ref/create_publication.sgml

6abc.
nit - Huh?? Not changed the parameter ID as told in a previous review [1-#6]
nit - Huh?? Not removed paragraph "This option is only available..."
as told in a previous review. See [1-#7]
nit - Huh?? Not removed paragraph "This parameter can only be set" as
told in a previous review. See [1-#8]

======
src/backend/catalog/pg_publication.c

7.
if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
- ereport(ERROR,
+ ereport(WARNING,
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("cannot use generated column \"%s\" in publication column list",
+ errmsg("specified generated column \"%s\" in publication column list
for publication with publish_generated_columns as false",
colname));

I did not understand how this WARNING can know
"publish_generated_columns as false"? Should the code be checking the
function parameter 'pubgencols'?

The errmsg also seemed a bit verbose. How about:
"specified generated column \"%s\" in publication column list when
publish_generated_columns = false"

======
src/backend/replication/logical/proto.c

8.
logicalrep_write_tuple:
logicalrep_write_attrs:

Reminder. I think I have multiple questions about this code from
previous reviews that may be still unanswered. See [2 #4]. Maybe when
you implement Amit's "column list takes precedence" behaviour then
this code is fine as-is (because the replication message might include
gencols or not-gecols regardless of the 'publish_generated_columns'
value). But I don't think that is the current implementation, so
something did not quite seem right. I am not sure. If you say it is
fine then I will believe it, but the question [2 #4] remains
unanswered.

======
src/backend/replication/pgoutput/pgoutput.c

9.
send_relation_and_attrs:

Reminder: Here is another question that was answered from [2 #5]. I
did not really trust it for the current implementation, but for the
"column list takes precedence" behaviour probably it will be ok.

~~~

10.
+/*
+ * Prepare new column list bitmap. This includes all the columns of the table.
+ */
+static Bitmapset *
+prepare_all_columns_bms(PGOutputData *data, RelationSyncEntry *entry,
+ TupleDesc desc)
+{

This function needs a better comment with more explanation about what
this is REALLY doing. e.g. it says "includes all columns of the
table", but tthe implementation is skipping generated cols, so clearly
it is not "all columns of the table".

~~~

11. pgoutput_column_list_init

TBH, I struggle to read the logic of this function. Rewriting some
parts, inverting some variables, and adding more commentary might help
a lot.

11a.
There are too many "negatives" (with ! operator and with the word "no"
in the variable).

e.g. code is written in a backward way like:
if (!pub_no_list)
cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
else
cols = prepare_all_columns_bms(data, entry, desc);

instead of what could have been said:
if (pub_rel_has_collist)
cols = pub_collist_to_bitmapset(cols, cfdatum, entry->entry_cxt);
else
cols = prepare_all_columns_bms(data, entry, desc);

~

11b.
- * If the publication is FOR ALL TABLES then it is treated the same as
- * if there are no column lists (even if other publications have a
- * list).
+ * If the publication is FOR ALL TABLES and include generated columns
+ * then it is treated the same as if there are no column lists (even
+ * if other publications have a list).
*/
- if (!pub->alltables)
+ if (!pub->alltables || !pub->pubgencols)

The code does not appear to match the comment ("If the publication is
FOR ALL TABLES and include generated columns"). If it did it should
look like "if (pub->alltables && pub->pubgencols)".

Also, should "and include generated column" be properly referring to
the new PUBLICATION parameter name?

Also, the comment is somewhat confusing. I saw in the thread Vignesh
wrote an explanation like "To handle cases where the
publish_generated_columns option isn't specified for all tables in a
publication, the pubgencolumns check needs to be performed. In such
cases, we must create a column list that excludes generated columns"
[3]. IMO that was clearer information so something similar should be
written in this code comment.
~

11c.
+ /* Build the column list bitmap in the per-entry context. */
+ if (!pub_no_list || !pub->pubgencols) /* when not null */

I don't know what "when not null" means here. Aren't those both
booleans? How can it be "null"?

======
src/bin/pg_dump/pg_dump.c

12. getPublications:

Huh?? The code has not changed to address an old review comment I had
posted to say there seem multiple code fragments missing that should
say "false AS pubgencols". Refer to [2 #7].

======
src/bin/pg_dump/t/002_pg_dump.pl

13.
'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
+ create_order => 51,
+ create_sql =>
+ 'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_table WHERE (col1 > 0);',
+ regexp => qr/^
+ \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_table WHERE
((col1 > 0));\E
+ /xm,
+ like => { %full_runs, section_post_data => 1, },
+ unlike => {
+ exclude_dump_test_schema => 1,
+ exclude_test_table => 1,
+ },
+ },
+
+ 'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE (col2 = \'test\');'
+ => {
+ create_order => 52,
+ create_sql =>
+ 'ALTER PUBLICATION pub5 ADD TABLE dump_test.test_second_table
WHERE (col2 = \'test\');',
+ regexp => qr/^
+ \QALTER PUBLICATION pub5 ADD TABLE ONLY dump_test.test_second_table
WHERE ((col2 = 'test'::text));\E
+ /xm,
+ like => { %full_runs, section_post_data => 1, },
+ unlike => { exclude_dump_test_schema => 1, },
+ },
+

It wasn't clear to me how these tests are related to the patch.
Shouldn't there instead be some ALTER tests for trying to modify the
'publish_generate_columns' parameter?

======
src/test/regress/expected/publication.out
src/test/regress/sql/publication.sql

14.
--- error: generated column "d" can't be in list
+-- ok: generated columns can be in the list too
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
-ERROR: cannot use generated column "d" in publication column list
+WARNING: specified generated column "d" in publication column list
for publication with publish_generated_columns as false

I think these tests for the WARNING scenario need to be a bit more
deliberate. This seems to have happened as a side-effect. For example,
I was expecting more testing like:

Comments about various combinations to say what you are doing and what
you are expecting:
- gencols in column list with publish_generated_columns=false, expecting WARNING
- gencols in column list with publish_generated_columns=true, NOT
expecting WARNING
- gencols in column list with publish_generated_columns=true, then
ALTER PUBLICATION setting publication_generate_columns=false,
expecting WARNING
- NO gencols in column list with publish_generated_columns=false, then
ALTER PUBLICATION to add gencols to column list, expecting WARNING

======
src/test/subscription/t/031_column_list.pl

15.
-# TEST: Generated and dropped columns are not considered for the column list.
+# TEST: Dropped columns are not considered for the column list.
# So, the publication having a column list except for those columns and a
-# publication without any column (aka all columns as part of the columns
+# publication without any column list (aka all columns as part of the column
# list) are considered to have the same column list.
$node_publisher->safe_psql(
'postgres', qq(
CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int
GENERATED ALWAYS AS (a + 1) STORED);
ALTER TABLE test_mix_4 DROP COLUMN c;

- CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
- CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
+ CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 WITH
(publish_generated_columns = true);
+ CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4 WITH
(publish_generated_columns = false);

I felt the comment for this test ought to be saying something more
about what you are doing with the 'publish_generated_columns'
parameters and what behaviour it was expecting.

======
Please refer to the attachment which addresses some of the nit
comments mentioned above.

======
[1] my review of v31-0001:
https://www.postgresql.org/message-id/CAHut%2BPsv-neEP_ftvBUBahh%2BKCWw%2BqQMF9N3sGU3YHWPEzFH-Q%40mail.gmail.com
[2] my review of v30-0001:
https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH%3DDHK0iXmZuXKPnxZ3Q%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_v320001.txt text/plain 2.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-09-24 00:06:40 Re: Partitioned tables and [un]loggedness
Previous Message David Rowley 2024-09-23 22:21:58 Re: Why don't we consider explicit Incremental Sort?