Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Shinoda, Noriyoshi (SXD Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2025-01-17 05:52:33
Message-ID: CAHut+PvrumYk7DxW6yLnx9ju9P24rU00-xw+fYbQqGS_gq4Kng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh.

Some review comments for patch v52-0003

======

1. GENERAL - change to use enum.

On Thu, Jan 16, 2025 at 7:47 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 15 Jan 2025 at 11:17, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > 2.
> > As suggested in more detail below, I think it would be better if you
> > can define a C code enum type for these potential values instead of
> > just using #define macros and char. I guess that will impact a lot of
> > the APIs.
>
> If we change it to enum, we will not be able to access
> PUBLISH_GENCOLS_NONE and PUBLISH_GENCOLS_STORED from describe.c files.
> Maybe that is the reason the macros were used in the case of
> pg_subscription.h also.

Hm. I am not sure. Can't you just define the enum inside the #ifdef
EXPOSE_TO_CLIENT_CODE? I saw some examples of this already (see
src/include/catalog/pg_cast.h)

e.g. I tried following, which compiles for me:

#ifdef EXPOSE_TO_CLIENT_CODE

typedef enum PublishGencolsType
{
/* Generated columns present should not be replicated. */
PUBLISH_GENCOLS_NONE = 'n',

/* Generated columns present should be replicated. */
PUBLISH_GENCOLS_STORED = 's',

} PublishGencolsType;

#endif /* EXPOSE_TO_CLIENT_CODE */

typedef struct Publication
{
Oid oid;
char *name;
bool alltables;
bool pubviaroot;
PublishGencolsType pubgencols_type;
PublicationActions pubactions;
} Publication;

======
doc/src/sgml/ref/create_publication.sgml

2.
<para>
Specifies whether the generated columns present in the tables
associated with the publication should be replicated.
- The default is <literal>false</literal>.
+ The default is <literal>none</literal> meaning the generated
+ columns present in the tables associated with publication will not be
+ replicated.
+ </para>

IMO it would be better to add another sentence before "The default
is..." to just spell out what the possible values are up-front. The
"The default is..." can also be preceded by a blank line.

SUGGESTION
Specifies whether the generated columns present in the tables
associated with the publication should be replicated. Possible values
are <literal>none</literal> and <literal>stored</literal>.

The default is <literal>none</literal> meaning...

======
src/backend/catalog/pg_publication.c

pub_form_cols_map:

3.
+ if (att->attgenerated)
+ {
+ if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+ else if (include_gencols_type != PUBLISH_GENCOLS_STORED)
+ continue;
+ }
+

I find it easier to read without the 'else' keyword. Also, I think
these can be commented on.

SUGGESTION
/* We only support replication of STORED generated cols. */
if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
continue;

/* User hasn't requested to replicate STORED generated cols. */
if (include_gencols_type != PUBLISH_GENCOLS_STORED)
continue;

~~~

pg_get_publication_tables:

4.
- if (att->attisdropped || (att->attgenerated && !pub->pubgencols))
+ if (att->attisdropped)
continue;

+ if (att->attgenerated)
+ {
+ if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+ else if (pub->pubgencols_type != PUBLISH_GENCOLS_STORED)
+ continue;
+ }
+

Ditto previous review comment #3 above.

======
src/backend/commands/publicationcmds.c

pub_contains_invalid_column:

5.
The function comment is still referring to "enabling
publish_generated_columns option" which kind of sounds like a boolean.

~~~

6.
/*
- * The publish_generated_columns option must be set to true if the
- * REPLICA IDENTITY contains any stored generated column.
+ * The publish_generated_columns option must be set to 's'(stored)
+ * if the REPLICA IDENTITY contains any stored generated column.
*/
- if (!pubgencols && att->attgenerated)
+ if (pubgencols_type == PUBLISH_GENCOLS_NONE && att->attgenerated)

The comment and the code seem a bit out of sync; the comment says must
be 's', so to match that the code ought to be checking !=
PUBLISH_GENCOLS_STORED.

~~~

defGetGeneratedColsOption:

7.
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s requires a \"none\" or \"stored\"",
+ def->defname));
+

Should that have the word "value"?
e.g. "%s requires a \"none\" or \"stored\" value

======
src/backend/replication/logical/proto.c

logicalrep_should_publish_column:

8.
+ return (att->attgenerated == ATTRIBUTE_GENERATED_STORED) ?
(include_gencols_type == PUBLISH_GENCOLS_STORED) : false;

The ternary is fine, but it might be neater to split it out so you can
comment on it

SUGGESTION
/* Stored generated columns are only published when the user sets
publish_generated_columns = stored. */
if (att->attgenerated == ATTRIBUTE_GENERATED_STORED)
return include_gencols_type == PUBLISH_GENCOLS_STORED;

return false;

======
src/backend/replication/pgoutput/pgoutput.c

typedef struct RelationSyncEntry:

9.
/*
- * This is set if the 'publish_generated_columns' parameter is true, and
- * the relation contains generated columns.
+ * This will be 's'(stored) if the relation contains generated columns and
+ * the 'publish_generated_columns' parameter is set to 's'(stored).
+ * Otherwise, it will be 'n'(none), indicating that no generated columns
+ * should be published.
*/
- bool include_gencols;
+ char include_gencols_type;

Is the comment strictly correct? What about overriding column lists
where the option is set to 'none'?

======
src/include/catalog/pg_publication.h

10.
- /* true if generated columns data should be published */
- bool pubgencols;
+ /*
+ * 's'stored) if generated column data should be published, 'n'(none) if
+ * it should not be published
+ */
+ char pubgencols_type;

Typo. Missing '('

Also better to put each value on a separate line, and explicitly
mention 'stored', etc.

SUGGESTION
/*
* 'n'(none) if generated column data should not be published.
* 's'(stored) if stored generated column data should be published.
*/

======
src/test/subscription/t/011_generated.pl

11.
# =============================================================================
# Exercise logical replication of a generated column to a subscriber side
# regular column. This is done both when the publication parameter
-# 'publish_generated_columns' is set to false (to confirm existing default
-# behavior), and is set to true (to confirm replication occurs).
+# 'publish_generated_columns' is set to 'none' (to confirm existing default
+# behavior), and is set to stored (to confirm replication occurs).

Why 'none' in quotes but stored not in quotes?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-01-17 06:06:35 Re: per backend I/O statistics
Previous Message Tom Lane 2025-01-17 05:43:45 Re: leafhopper: Assert("outerstartsel <= outerendsel"), File: "costsize.c"