Re: Pgoutput not capturing the generated columns

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Shinoda, Noriyoshi (SXD Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2025-01-16 08:47:08
Message-ID: CALDaNm3OcXdY0EzDEKAfaK9gq2B67Mfsgxu93+_249ohyts=0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 15 Jan 2025 at 11:17, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Vignesh.
>
> Some review comments for patch 0001
>
> ======
> GENERAL
>
> 1.
> AFAIK there are still many places in the docs where there is no
> distinction made between the stored/virtual generated cols. Maybe we
> need another patch in this patch set to address all of these?

> For example, CREATE PUBLICATION says "The column list can contain
> generated columns as well.". Is that really true? What happens if we
> try to publish a VIRTUAL column via a column list?.

Currently I have mentioned that only stored generated columns will be
published. Let's see if there are only a few changes we can keep with
this patch else we could merge with the other document patch.

> ~~
>
> 2.
> As suggested in more detail below, I think it would be better if you
> can define a C code enum type for these potential values instead of
> just using #define macros and char. I guess that will impact a lot of
> the APIs.

If we change it to enum, we will not be able to access
PUBLISH_GENCOLS_NONE and PUBLISH_GENCOLS_STORED from describe.c files.
Maybe that is the reason the macros were used in the case of
pg_subscription.h also.

> ======
> doc/src/sgml/ref/create_publication.sgml
>
> CREATE PUBLICATION, publish_generated_columns (enum)
>
> 3.
> + <para>
> + If set to <literal>stored</literal>, the generated columns present in
> + the tables associated with publication will be replicated.
> </para>
>
> Maybe that say should say /the generated columns/the STORED generated columns/

Modified

> ~~~
>
> 4.
> The next NOTE part is still referring wrongly to a boolean option:
>
> If the subscriber is from a release prior to 18, then initial table
> synchronization won't copy generated columns even if parameter
> publish_generated_columns is true in the publisher.
>
> SUGGESTION
> If the subscriber is from a release prior to 18, then initial table
> synchronization won't copy generated columns even if parameter
> publish_generated_columns is not <literal>none</literal> in the
> publisher.

Modified

> ======
> src/backend/catalog/pg_publication.c
>
> pub_form_cols_map:
>
> 5.
> /*
> * Returns a bitmap representing the columns of the specified table.
> *
> - * Generated columns are included if include_gencols is true.
> + * Generated columns are included if gencols_type is 'stored'.
> */
>
> The code is only checking 'none' but in future just because it is not
> 'none' does not mean it must be 'stored'. So maybe better to make this
> comment future proof.
>
> SUGGESTION
> Generated columns are included if gencols_type is not PUBLISH_GENCOLS_NONE.

The current approach is more appropriate here, as replication of
generated virtual columns will not be supported in the initial
version.

> ======
> src/backend/commands/publicationcmds.c
>
> defGetGeneratedColsOption:
>
> 6.
> + /*
> + * If no parameter value given, assume "stored" is meant.
> + */
> + if (!def->arg)
> + return PUBLISH_GENCOL_STORED;
>
> Really, I think if no parameter is given it should mean "all". Now,
> today there is only one valid value so 'all' and 'stored' are
> equivalent, but I was wondering should you add another 'a'/'all' enum
> so this meaning will already there for the future? Otherwise the no
> parameter implementIion will have a slightly different meaning later.

I believe the "all" option should be added when virtual columns are
introduced. Since the replication of virtual generated columns will
likely be phased—first with the addition of the virtual generated
column feature itself, followed later by support for logical
replication of these columns—it’s probable that logical replication of
virtual columns won’t be available in the initial release. Therefore,
the "all" option should be introduced when logical replication of
virtual generated columns is added.

> ~~
>
> 7.
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("%s requires a \"none\" or \"stored\"",
> + def->defname));
> + return PUBLISH_GENCOL_NONE; /* keep compiler quiet */
>
> Put a blank line after the ereport.

Modified

> ======
> src/backend/replication/logical/proto.c
>
> 8. General for all this file
>
> static void logicalrep_write_attrs(StringInfo out, Relation rel,
> - Bitmapset *columns, bool include_gencols);
> + Bitmapset *columns, char gencols_type);
> static void logicalrep_write_tuple(StringInfo out, Relation rel,
> TupleTableSlot *slot,
> bool binary, Bitmapset *columns,
> - bool include_gencols);
> + char gencols_type);
>
> IMO all these parameter names changes all throughout this file should
> be more like "include_gencols_type"

Modified

> ~~~
>
> 9.
> * 'gencols_type' value indicates whether generated columns should be
> * published when there is no column list. Typically, this will have the same
> * value as the 'publish_generated_columns' publication parameter.
> *
> * Note that generated columns can be published only when present in a
> * publication column list, or when gencols_type is 'stored'.
>
> 9a.
> /gencols_type/include_gencols_type/

Modified

> ~
>
> 9b..
> To future-proof this comment it might be more generic to say:
>
> "...or when include_gencols_type is not PUBLISH_GENCOLS_NONE"

The current approach is more appropriate here, when virtual generated
columns are added we don't want it to be published without special
handling for it.

> ~~
>
> 10.
> /* All non-generated columns are always published. */
> - return att->attgenerated ? include_gencols : true;
> + return att->attgenerated ? (gencols_type == PUBLISH_GENCOL_STORED) : true;
>
> Would it be better (more future-proof) to express this like:
>
> return att->attgenerated ? (gencols_type != PUBLISH_GENCOLS_NONE) : true;

Modified

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 11.
> As mentioned elsewhere, IMO it might be more meaningful to name the
> param/field/variable with an appropriate verb -- e.g.
> "include_gencols_type" or "pubgencols_type", instead of just
> "gencols_type".

Modified

> ~~~
>
> typedef struct RelationSyncEntry:
>
> 12.
> /*
> - * This is set if the 'publish_generated_columns' parameter is true, and
> - * the relation contains generated columns.
> + * This is set if the 'publish_generated_columns' parameter is 'stored',
> + * and the relation contains generated columns.
> */
> - bool include_gencols;
> + char gencols_type;
>
> The comment doesn't make sense to me -- I think something got lost
> when translating it from a boolean. e.g. I thought this field should
> *always* be set, even if it is set to a value of PUBLISH_GENCOLS_NONE.

Modified

> ======
> src/backend/utils/cache/relcache.c
> ======
> src/bin/pg_dump/pg_dump.c
>
> 13.
> if (fout->remoteVersion >= 180000)
> - appendPQExpBufferStr(query, "p.pubgencols ");
> + appendPQExpBufferStr(query, "p.pubgencols_type ");
> else
> - appendPQExpBufferStr(query, "false AS pubgencols ");
> + appendPQExpBufferStr(query, "false AS pubgencols_type ");
>
> Is that a bug? I thought it should be "'n' AS pubgencols_type"

Yes it should be n, modified

> ~~~
>
> 14.
> - i_pubgencols = PQfnumber(res, "pubgencols");
> + i_pubgencols = PQfnumber(res, "pubgencols_type");
>
> Shouldn't we also rename the variable being assigned -- e.g. i_pubgencols_type

Modified

> ======
> src/bin/pg_dump/pg_dump.h
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
> ======
> src/bin/psql/describe.c
> ======
> src/include/catalog/pg_publication.h
>
> 15.
> - /* true if generated columns data should be published */
> - bool pubgencols;
> + /* 's'(stored) if generated columns data should be published */
> + char pubgencols_type;
>
> I thought the comment should document all the possible values -- e.g.
> what about 'n'?

Modified

> ~~~
>
> 16.
> - bool pubgencols;
> + char pubgencols_type;
> PublicationActions pubactions;
> } Publication;
>
> @@ -124,6 +124,16 @@ typedef struct PublicationRelInfo
> List *columns;
> } PublicationRelInfo;
>
> IMO all the APIs for functions that use this could be improved if you
> are able to define a proper C enum for this field instead of just
> using chars.
>
> Anyway, you can still assigned the enum values to 'n' and 's' if it helps.

If we change it to enum, we will not be able to access
PUBLISH_GENCOLS_NONE and PUBLISH_GENCOLS_STORED from describe.c files.
Maybe that is the reason the macros were used in the case of
pg_subscription.h also.

> ~~~
>
> 17,
> +#ifdef EXPOSE_TO_CLIENT_CODE
> +
> +/* Generated columns present should not be replicated. */
> +#define PUBLISH_GENCOL_NONE 'n'
> +
> +/* Generated columns present should be replicated. */
> +#define PUBLISH_GENCOL_STORED 's'
> +
> +#endif /* EXPOSE_TO_CLIENT_CODE */
>
> These values (which I thought ought to be enum values) should be
> PUBLISH_GENCOLS_NONE, and PUBLISH_GENCOLS_STORED (e.g. _GENCOLS_
> instead of _GENCOL_).

Modified

> ======
> src/include/commands/publicationcmds.h
>
> 18.
> extern bool pub_contains_invalid_column(Oid pubid, Relation relation,
> List *ancestors, bool pubviaroot,
> - bool pubgencols,
> + char gencols_type,
> bool *invalid_column_list,
> bool *invalid_gen_col);
> Should new param name be 'pubgencols_type'?

Modified

> ======
> src/include/replication/logicalproto.h
>
> 19.
> - bool include_gencols);
> + char gencols_type);
>
> Should all the changes like this be named "include_gencols_type" ?

Modified

> ======
> src/test/regress/expected/publication.out
>
> 20.
> \dRp
> List of publications
> Name | Owner | All tables | Inserts
> | Updates | Deletes | Truncates | Generated columns | Via root
> --------------------+--------------------------+------------+---------+---------+---------+-----------+-------------------+----------
> testpib_ins_trunct | regress_publication_user | f | t
> | f | f | f | n | f
> testpub_default | regress_publication_user | f | f
> | t | f | f | n | f
> (2 rows)
>
> 20a.
> Why does that display 'n'? I expected it should say 'none'

Modified

> ~
>
> 20b.
> Looks like a typo in the name /testpib_ins_trunct/testpub_ins_trunct/

Modified

> ~~~
>
> 21.
> \dRp+ testpub_default
> Publication testpub_default
> Owner | All tables | Inserts | Updates | Deletes |
> Truncates | Generated columns | Via root
> --------------------------+------------+---------+---------+---------+-----------+-------------------+----------
> regress_publication_user | f | t | t | t |
> f | none | f
> (1 row)
>
> -- fail - must be owner of publication
> SET ROLE regress_publication_user_dummy;
> ALTER PUBLICATION testpub_default RENAME TO testpub_dummy;
> ERROR: must be owner of publication testpub_default
> RESET ROLE;
> ALTER PUBLICATION testpub_default RENAME TO testpub_foo;
> \dRp testpub_foo
> List of publications
> Name | Owner | All tables | Inserts |
> Updates | Deletes | Truncates | Generated columns | Via root
> -------------+--------------------------+------------+---------+---------+---------+-----------+-------------------+----------
> testpub_foo | regress_publication_user | f | t | t
> | t | f | n | f
> (1 row)
>
> Notice there is a 'n' for \dRp but a 'none' for \dRp+. It looks like
> only the \dRp is broken.

Modified

> ======
> src/test/regress/sql/publication.sql
>
> 22. General
>
> I found lots of places referring to 'publication_generate_columns'.
> But there is no such thing. It is supposed to say
> publish_generated_columns.

Modified

> ~~
>
> 23. Missing test?
>
> I think there now should be a test for a publication with
> publish_gfenerated_column option, but having no value to verify that
> it is the same as stored.

Added a test for this

> ======
> src/test/subscription/t/011_generated.pl
>
> 24.
> # =============================================================================
> # Exercise logical replication of a generated column to a subscriber side
> # regular column. This is done both when the publication parameter
> -# 'publish_generated_columns' is set to false (to confirm existing default
> +# 'publish_generated_columns' is set to 'none' (to confirm existing default
> # behavior), and is set to true (to confirm replication occurs).
>
> Not fully updated. This is still saying "and is set to true"

Modified

> ~~~
>
> 25.
> # Verify that the generated column data is not replicated during incremental
> # replication when publish_generated_columns is set to false.
>
> The above comment (still present in the file) still refers to the
> boolean values of the option.

Modified

Thanks for the comments, the attached v2 version patch has the changes
for the same.

v52-0001 - There was an issue when we describe publication for a PG17
server, this 0001 patch fixes the same.
v52-0002 - One typo related to a publication name which Peter reported.
v52-0003 - Change publish_generated_columns option to use enum instead
of boolean.
v52-0004 and v52-0005 are the previous patches from [1].

[1] - https://www.postgresql.org/message-id/CAHut%2BPsPZVE8r1VeznuF3hSiTxcbprzaiBedSmMv39FYoHn4YA%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
v52-0001-Fix-describe-publication-with-PG17.patch text/x-patch 2.5 KB
v52-0002-Fix-a-small-typo-in-publication-name.patch text/x-patch 6.1 KB
v52-0003-Change-publish_generated_columns-option-to-use-e.patch text/x-patch 88.4 KB
v52-0004-Add-missing-pubgencols-attribute-docs-for-pg_pub.patch text/x-patch 1.1 KB
v52-0005-DOCS-Generated-Column-Replication.patch text/x-patch 12.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-01-16 08:49:31 Re: Pgoutput not capturing the generated columns
Previous Message jian he 2025-01-16 08:44:18 Re: Non-text mode for pg_dumpall