Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-09 05:43:11
Message-ID: CAHut+Pv8Do3b7QhzHg7dRWhO317ZFZKY_mYQaFBOWVQ-P1805A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, here are my review comments for patch v37-0001.

======
Commit message

1.
Example usage of subscription option:
CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns
= true);

~

This is wrong -- it's not a "subscription option". Better to just say
"Example usage:"

~~~

2.
When 'copy_data' is true, during the initial sync, the data is replicated from
the publisher to the subscriber using the COPY command. The normal COPY
command does not copy generated columns, so when 'publish_generated_columns'
is true...

~

By only mentioning the "when ... is true" case this description does
not cover the scenario when 'publish_generated_columns' is false when
the publication column list has a generated column.

~~~

3.
typo - /replication of generated column/replication of generated columns/
typo - /filed/filled/
typo - 'pg_publicataion' catalog

======
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
4.
nit - missing word in a comment

~~~

fetch_remote_table_info:
5.
+ appendStringInfo(&cmd,
" FROM pg_catalog.pg_attribute a"
" LEFT JOIN pg_catalog.pg_index i"
" ON (i.indexrelid = pg_get_replica_identity_index(%u))"
" WHERE a.attnum > 0::pg_catalog.int2"
- " AND NOT a.attisdropped %s"
+ " AND NOT a.attisdropped", lrel->remoteid);
+
+ appendStringInfo(&cmd,
" AND a.attrelid = %u"
" ORDER BY a.attnum",
- lrel->remoteid,
- (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
- "AND a.attgenerated = ''" : ""),
lrel->remoteid);

Version v37-0001 has removed a condition previously between these two
appendStringInfo's. But, that now means there is no reason to keep
these statements separated. These should be combined now to use one
appendStringInfo.

~

6.
+ if (server_version >= 120000)
+ remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
+

Are you sure the version check for 120000 is correct? IIUC, this 5
matches the 'attgenerated' column, but the SQL for that was
constructed using a different condition:
if (server_version >= 180000)
appendStringInfo(&cmd, ", a.attgenerated != ''");

It is this 120000 versus 180000 difference that makes me suspicious of
a potential mistake.

~~~

7.
+ /*
+ * If the column is generated and neither the generated column option
+ * is specified nor it appears in the column list, we will skip it.
+ */
+ if (remotegenlist[natt] && !has_pub_with_pubgencols &&
+ !bms_is_member(attnum, included_cols))
+ {
+ ExecClearTuple(slot);
+ continue;
+ }

7b.
I am also suspicious about how this condition interacts with the other
condition (shown below) that came earlier:
/* If the column is not in the column list, skip it. */
if (included_cols != NULL && !bms_is_member(attnum, included_cols))

Something doesn't seem right. e.g. If we can only get here by passing
the earlier condition, then it means we already know the generated
condition was *not* a member of a column list.... in which case that
should affect this new condition and the new comment too.

======
src/backend/replication/pgoutput/pgoutput.c

pgoutput_column_list_init:

8.
/*
- * If the publication is FOR ALL TABLES then it is treated the same as
- * if there are no column lists (even if other publications have a
- * list).
+ * Process potential column lists for the following cases: a. Any
+ * publication that is not FOR ALL TABLES. b. When the publication is
+ * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL
+ * TABLES publication doesn't have user-defined column lists, so all
+ * columns will be replicated by default. However, if
+ * 'publish_generated_columns' is set to false, column lists must
+ * still be created to exclude any generated columns from being
+ * published.
*/

nit - please reformat this comment so the bullets are readable

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_GENCOLS_v370001.txt text/plain 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2024-10-09 06:06:11 Re: Inconsistent RestrictInfo serial numbers
Previous Message vignesh C 2024-10-09 05:30:22 Re: Pgoutput not capturing the generated columns