Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-28 02:13:10
Message-ID: CAHut+PsM38DWdMvCCeSh9YxW7KONQfNrcwFBoK-mQChtzTwrEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, here are my review comments for patch v43-0001.

======

1. Missing docs update?

The CREATE PUBLICATION docs currently says:
When a column list is specified, only the named columns are
replicated. If no column list is specified, all columns of the table
are replicated through this publication, including any columns added
later.

~

For this patch, should that be updated to say "... all columns (except
generated columns) of the table are replicated..."

======
src/backend/replication/logical/proto.c

2.
+static bool
+should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+{
+ if (att->attisdropped)
+ return false;
+
+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ return false;
+
+ if (!column_in_column_list(att->attnum, columns))
+ return false;
+
+ return true;
+}

Here, I wanted to suggest that the whole "Skip publishing generated
columns" if-part is unnecessary because the next check
(!column_in_column_list) is going to return false for the same
scenario anyhow.

But, unfortunately, the "column_in_column_list" function has some
special NULL handling logic in it; this means none of this code is
quite what it seems to be (e.g. the function name
column_in_column_list is somewhat misleading)

IMO it would be better to change the column_in_column_list signature
-- add another boolean param to say if a NULL column list is allowed
or not. That will remove any subtle behaviour and then you can remove
the "if (att->attgenerated && !columns)" part.

======
src/backend/replication/pgoutput/pgoutput.c

3. send_relation_and_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
continue;

if (att->atttypid < FirstGenbkiObjectId)
continue;

+ /*
+ * Skip publishing generated columns if they are not included in the
+ * column list.
+ */
+ if (att->attgenerated && !columns)
+ continue;
+
/* Skip this attribute if it's not present in the column list */
if (columns != NULL && !bms_is_member(att->attnum, columns))
continue;
~

Most of that code above looks to be doing the very same thing as the
new 'should_publish_column' in proto.c. Won't it be better to expose
the other function and share the common logic?

~~~

4. pgoutput_column_list_init

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
continue;

+ if (att->attgenerated)
+ {
+ if (bms_is_member(att->attnum, cols))
+ gencolpresent = true;
+
+ continue;
+ }
+
nliveatts++;
}

/*
- * If column list includes all the columns of the table,
- * set it to NULL.
+ * If column list includes all the columns of the table
+ * and there are no generated columns, set it to NULL.
*/
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts && !gencolpresent)
{

Something seems not quite right (or maybe redundant) with this logic.
For example, because you unconditionally 'continue' for generated
columns, then AFAICT it is just not possible for bms_num_members(cols)
== nliveatts and at the same time 'gencolpresent' to be true. So you
could've just Asserted (!gencolpresent) instead of checking it in the
condition and mentioning it in the comment.

======
src/test/regress/expected/publication.out

5.
--- error: generated column "d" can't be in list
+-- ok: generated column "d" can be in the list too
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
-ERROR: cannot use generated column "d" in publication column list

By allowing the above to work without giving ERROR, I think you've
broken many subsequent test expected results. e.g. I don't trust these
"expected" results anymore because I didn't think these next test
errors should have been affected, right?

-- error: system attributes "ctid" not allowed in column list
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
-ERROR: cannot use system column "ctid" in publication column list
+ERROR: relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

~

-- error: duplicates not allowed in column list
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
-ERROR: duplicate column "a" in publication column list
+ERROR: relation "testpub_tbl5" is already member of publication
"testpub_fortable"

Hmm - looks like a wrong expected result to me.

probably more like this...

======
src/test/subscription/t/031_column_list.pl

6.
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE test_gen (a int PRIMARY KEY, b int);
+));
+
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ CREATE SUBSCRIPTION sub_gen CONNECTION '$publisher_connstr'
PUBLICATION pub_gen;
+));

Should combine these.

~~~

7.
+$node_publisher->wait_for_catchup('sub_gen');
+
+is( $node_subscriber->safe_psql(
+ 'postgres', "SELECT * FROM test_gen ORDER BY a"),
+ qq(1|2),
+ 'replication with generated columns in column list');
+

But, this is only testing normal replication. You should also include
some initial table data so you can test that the initial table
synchronization works too. Otherwise, I think current this patch has
no proof that the initial 'copy_data' even works at all.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcelo Fernandes 2024-10-28 02:43:23 Proposal to add exclusion constraint from existing index
Previous Message Umar Hayat 2024-10-28 01:42:26 Re: msvc directory missing in PostgreSQL 17.0