Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-11-05 01:29:59
Message-ID: CAHut+Puy8V=ouejVrpn1X94iKt1j_9aaNCrUOzv9ViZw8A4x7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh,

Here are my review comments for your latest patch v47-0001.

======
doc/src/sgml/ddl.sgml

1.
<para>
- Generated columns can be replicated during logical replication by
- including them in the column list of the
- <command>CREATE PUBLICATION</command> command.
+ Generated columns are allowed to be replicated during logical replication
+ according to the <command>CREATE PUBLICATION</command> option
+ <link linkend="sql-createpublication-params-with-publish-generated-columns">
+ <literal>include_generated_columns</literal></link> or by including them
+ in the column list of the <command>CREATE PUBLICATION</command> command.
</para>

1a.
This text gives the wrong name for the new parameter.
/include_generated_columns/publish_generated_columns/

~

1b.
Everywhere in this patch (except here), this is called the
'publish_generated_columns' parameter (not "option") so it should be
called a parameter here also. Anyway, apparently that is the docs rule
-- see [1].

BTW, the same applies for the commit message 1st line of this patch:
[PATCH v47 1/2] Enable support for 'publish_generated_columns' option.
Should be
[PATCH v47 1/2] Enable support for 'publish_generated_columns' parameter.

======
doc/src/sgml/protocol.sgml

2.
- Next, one of the following submessages appears for each column:
+ Next, one of the following submessages appears for each published column:

The change is OK. But, note that there are other descriptions just
like this one on the same page, so if you are going to say "published"
here, then to be consistent you probably want to consider updating the
other places as well.

======
src/backend/catalog/pg_publication.c

3.
+bool
+has_column_list_defined(Publication *pub, Oid relid)
+{
+ HeapTuple cftuple = NULL;
+ bool isnull = true;

Since you chose not to rearrange the HeapTupleIsValid check, this
'isnull' declaration should be relocated within the if-block.

======
src/backend/replication/logical/proto.c

4.
/*
* Check if the column 'att' of a table should be published.
*
- * 'columns' represents the column list specified for that table in the
- * publication.
+ * 'columns' represents the publication column list (if any) for that table.
*
- * Note that generated columns can be present only in 'columns' list.
+ * Note that generated columns can be published only when present in a
+ * publication column list, or when include_gencols is true.
*/
bool
-logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
+logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
+ bool include_gencols)

The function comment describes 'columns' but it doesn't describe
'include_gencols'. I think knowing more about that parameter would be
helpful.

SUGGESTION:
The 'include_gencols' flag indicates whether generated columns should
be published when there is no column list. Typically, this will have
the same value as the 'publish_generated_columns' publication
parameter.

======
src/backend/replication/logical/relation.c

5.
@@ -421,7 +421,7 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)
int attnum;
Form_pg_attribute attr = TupleDescAttr(desc, i);

- if (attr->attisdropped || attr->attgenerated)
+ if (attr->attisdropped)
{
entry->attrmap->attnums[i] = -1;
continue;
@@ -432,7 +432,15 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
LOCKMODE lockmode)

entry->attrmap->attnums[i] = attnum;
if (attnum >= 0)
+ {
+ if (attr->attgenerated)
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("replicating to a target relation's generated column \"%s\"
for \"%s.%s\" is not supported",
+ NameStr(attr->attname), remoterel->nspname, remoterel->relname));
+
missingatts = bms_del_member(missingatts, attnum);
+ }

Hmm. I think this more descriptive error is a good improvement over
the previous "missing" error, but I just don't think it belongs in
this patch. This is impacting the existing "regular" ==> "generated"
replication as well, which seems out-of-scope for this gencols patch.

IMO this ought to be made as a separate patch that can be pushed to
master separately/independently *before* any of this new gencols
stuff.

Also, you already said in the commit message:
* Publisher not-generated column => subscriber generated column:
This will give ERROR (not changed by this patch).

So the "not changed by this patch" part is not true if these changes
are included.

======
src/backend/replication/pgoutput/pgoutput.c

6.
+ /*
+ * Include publishing generated columns if 'publish_generated_columns'
+ * parameter is set to true, this will be set only if the relation
+ * contains any generated column.
+ */
+ bool include_gencols;
+

Minor rewording.

SUGGESTION:
Include generated columns for publication is set true if
'publish_generated_columns' parameter is true, and the relation
contains generated columns.

~~~

7.
+ /*
+ * Retrieve the columns if they haven't been prepared yet, and
+ * only if multiple publications exist.
+ */
+ if (!relcols && (list_length(publications) > 1))
+ {
+ pgoutput_ensure_entry_cxt(data, entry);
+ relcols = pub_form_cols_map(relation, entry->include_gencols,
+ entry->entry_cxt);
+ }

IIUC the purpose of this is for ensuring that the column lists are
consistent across all publications. That is why we only do this when
there are > 1 publications. For the 1st publication with no column
list we cache all the columns (in 'relcols') so later the cols of the
*current* publication (in 'cols') can be checked to see if they are
the same.

TBH, I think this part needs to have more explanation because it's a
bit too subtle; you have to read between the lines to figure out what
it is doing instead of just having a comment to clearly describe the
logic up-front.

======
[1] option versus parameter -
https://www.postgresql.org/message-id/CAKFQuwZVJ%2B_Z0pMX%3DBBKF9A6skVqiv89gxEgFOX7cwtWJj-Ccw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2024-11-05 02:22:28 Re: Statistics Import and Export
Previous Message Michael Paquier 2024-11-05 01:12:58 Re: general purpose array_sort