Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Shinoda, Noriyoshi (SXD Japan FSIP)" <noriyoshi(dot)shinoda(at)hpe(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2025-01-15 05:47:17
Message-ID: CAHut+Pt_A9ZNyTPPxai8qs2eQ2VYWwRQytsB=3Qd0+c+14X5Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Vignesh.

Some review comments for patch 0001

======
GENERAL

1.
AFAIK there are still many places in the docs where there is no
distinction made between the stored/virtual generated cols. Maybe we
need another patch in this patch set to address all of these?

For example, CREATE PUBLICATION says "The column list can contain
generated columns as well.". Is that really true? What happens if we
try to publish a VIRTUAL column via a column list?

~~

2.
As suggested in more detail below, I think it would be better if you
can define a C code enum type for these potential values instead of
just using #define macros and char. I guess that will impact a lot of
the APIs.

======
doc/src/sgml/ref/create_publication.sgml

CREATE PUBLICATION, publish_generated_columns (enum)

3.
+ <para>
+ If set to <literal>stored</literal>, the generated columns present in
+ the tables associated with publication will be replicated.
</para>

Maybe that say should say /the generated columns/the STORED generated columns/

~~~

4.
The next NOTE part is still referring wrongly to a boolean option:

If the subscriber is from a release prior to 18, then initial table
synchronization won't copy generated columns even if parameter
publish_generated_columns is true in the publisher.

SUGGESTION
If the subscriber is from a release prior to 18, then initial table
synchronization won't copy generated columns even if parameter
publish_generated_columns is not <literal>none</literal> in the
publisher.

======
src/backend/catalog/pg_publication.c

pub_form_cols_map:

5.
/*
* Returns a bitmap representing the columns of the specified table.
*
- * Generated columns are included if include_gencols is true.
+ * Generated columns are included if gencols_type is 'stored'.
*/

The code is only checking 'none' but in future just because it is not
'none' does not mean it must be 'stored'. So maybe better to make this
comment future proof.

SUGGESTION
Generated columns are included if gencols_type is not PUBLISH_GENCOLS_NONE.

======
src/backend/commands/publicationcmds.c

defGetGeneratedColsOption:

6.
+ /*
+ * If no parameter value given, assume "stored" is meant.
+ */
+ if (!def->arg)
+ return PUBLISH_GENCOL_STORED;

Really, I think if no parameter is given it should mean "all". Now,
today there is only one valid value so 'all' and 'stored' are
equivalent, but I was wondering should you add another 'a'/'all' enum
so this meaning will already there for the future? Otherwise the no
parameter implementIion will have a slightly different meaning later.

~~

7.
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s requires a \"none\" or \"stored\"",
+ def->defname));
+ return PUBLISH_GENCOL_NONE; /* keep compiler quiet */

Put a blank line after the ereport.

======
src/backend/replication/logical/proto.c

8. General for all this file

static void logicalrep_write_attrs(StringInfo out, Relation rel,
- Bitmapset *columns, bool include_gencols);
+ Bitmapset *columns, char gencols_type);
static void logicalrep_write_tuple(StringInfo out, Relation rel,
TupleTableSlot *slot,
bool binary, Bitmapset *columns,
- bool include_gencols);
+ char gencols_type);

IMO all these parameter names changes all throughout this file should
be more like "include_gencols_type"

~~~

9.
* 'gencols_type' value indicates whether generated columns should be
* published when there is no column list. Typically, this will have the same
* value as the 'publish_generated_columns' publication parameter.
*
* Note that generated columns can be published only when present in a
* publication column list, or when gencols_type is 'stored'.

9a.
/gencols_type/include_gencols_type/

~

9b..
To future-proof this comment it might be more generic to say:

"...or when include_gencols_type is not PUBLISH_GENCOLS_NONE"

~~

10.
/* All non-generated columns are always published. */
- return att->attgenerated ? include_gencols : true;
+ return att->attgenerated ? (gencols_type == PUBLISH_GENCOL_STORED) : true;

Would it be better (more future-proof) to express this like:

return att->attgenerated ? (gencols_type != PUBLISH_GENCOLS_NONE) : true;

======
src/backend/replication/pgoutput/pgoutput.c

11.
As mentioned elsewhere, IMO it might be more meaningful to name the
param/field/variable with an appropriate verb -- e.g.
"include_gencols_type" or "pubgencols_type", instead of just
"gencols_type".

~~~

typedef struct RelationSyncEntry:

12.
/*
- * This is set if the 'publish_generated_columns' parameter is true, and
- * the relation contains generated columns.
+ * This is set if the 'publish_generated_columns' parameter is 'stored',
+ * and the relation contains generated columns.
*/
- bool include_gencols;
+ char gencols_type;

The comment doesn't make sense to me -- I think something got lost
when translating it from a boolean. e.g. I thought this field should
*always* be set, even if it is set to a value of PUBLISH_GENCOLS_NONE.

======
src/backend/utils/cache/relcache.c
======
src/bin/pg_dump/pg_dump.c

13.
if (fout->remoteVersion >= 180000)
- appendPQExpBufferStr(query, "p.pubgencols ");
+ appendPQExpBufferStr(query, "p.pubgencols_type ");
else
- appendPQExpBufferStr(query, "false AS pubgencols ");
+ appendPQExpBufferStr(query, "false AS pubgencols_type ");

Is that a bug? I thought it should be "'n' AS pubgencols_type"

~~~

14.
- i_pubgencols = PQfnumber(res, "pubgencols");
+ i_pubgencols = PQfnumber(res, "pubgencols_type");

Shouldn't we also rename the variable being assigned -- e.g. i_pubgencols_type

======
src/bin/pg_dump/pg_dump.h
======
src/bin/pg_dump/t/002_pg_dump.pl
======
src/bin/psql/describe.c
======
src/include/catalog/pg_publication.h

15.
- /* true if generated columns data should be published */
- bool pubgencols;
+ /* 's'(stored) if generated columns data should be published */
+ char pubgencols_type;

I thought the comment should document all the possible values -- e.g.
what about 'n'?

~~~

16.
- bool pubgencols;
+ char pubgencols_type;
PublicationActions pubactions;
} Publication;

@@ -124,6 +124,16 @@ typedef struct PublicationRelInfo
List *columns;
} PublicationRelInfo;

IMO all the APIs for functions that use this could be improved if you
are able to define a proper C enum for this field instead of just
using chars.

Anyway, you can still assigned the enum values to 'n' and 's' if it helps.

~~~

17,
+#ifdef EXPOSE_TO_CLIENT_CODE
+
+/* Generated columns present should not be replicated. */
+#define PUBLISH_GENCOL_NONE 'n'
+
+/* Generated columns present should be replicated. */
+#define PUBLISH_GENCOL_STORED 's'
+
+#endif /* EXPOSE_TO_CLIENT_CODE */

These values (which I thought ought to be enum values) should be
PUBLISH_GENCOLS_NONE, and PUBLISH_GENCOLS_STORED (e.g. _GENCOLS_
instead of _GENCOL_).

======
src/include/commands/publicationcmds.h

18.
extern bool pub_contains_invalid_column(Oid pubid, Relation relation,
List *ancestors, bool pubviaroot,
- bool pubgencols,
+ char gencols_type,
bool *invalid_column_list,
bool *invalid_gen_col);
Should new param name be 'pubgencols_type'?

======
src/include/replication/logicalproto.h

19.
- bool include_gencols);
+ char gencols_type);

Should all the changes like this be named "include_gencols_type" ?

======
src/test/regress/expected/publication.out

20.
\dRp
List of publications
Name | Owner | All tables | Inserts
| Updates | Deletes | Truncates | Generated columns | Via root
--------------------+--------------------------+------------+---------+---------+---------+-----------+-------------------+----------
testpib_ins_trunct | regress_publication_user | f | t
| f | f | f | n | f
testpub_default | regress_publication_user | f | f
| t | f | f | n | f
(2 rows)

20a.
Why does that display 'n'? I expected it should say 'none'

~

20b.
Looks like a typo in the name /testpib_ins_trunct/testpub_ins_trunct/

~~~

21.
\dRp+ testpub_default
Publication testpub_default
Owner | All tables | Inserts | Updates | Deletes |
Truncates | Generated columns | Via root
--------------------------+------------+---------+---------+---------+-----------+-------------------+----------
regress_publication_user | f | t | t | t |
f | none | f
(1 row)

-- fail - must be owner of publication
SET ROLE regress_publication_user_dummy;
ALTER PUBLICATION testpub_default RENAME TO testpub_dummy;
ERROR: must be owner of publication testpub_default
RESET ROLE;
ALTER PUBLICATION testpub_default RENAME TO testpub_foo;
\dRp testpub_foo
List of publications
Name | Owner | All tables | Inserts |
Updates | Deletes | Truncates | Generated columns | Via root
-------------+--------------------------+------------+---------+---------+---------+-----------+-------------------+----------
testpub_foo | regress_publication_user | f | t | t
| t | f | n | f
(1 row)

Notice there is a 'n' for \dRp but a 'none' for \dRp+. It looks like
only the \dRp is broken.

======
src/test/regress/sql/publication.sql

22. General

I found lots of places referring to 'publication_generate_columns'.
But there is no such thing. It is supposed to say
publish_generated_columns.

~~~

23. Missing test?

I think there now should be a test for a publication with
publish_gfenerated_column option, but having no value to verify that
it is the same as stored.

======
src/test/subscription/t/011_generated.pl

24.
# =============================================================================
# Exercise logical replication of a generated column to a subscriber side
# regular column. This is done both when the publication parameter
-# 'publish_generated_columns' is set to false (to confirm existing default
+# 'publish_generated_columns' is set to 'none' (to confirm existing default
# behavior), and is set to true (to confirm replication occurs).

Not fully updated. This is still saying "and is set to true"

~~~

25.
# Verify that the generated column data is not replicated during incremental
# replication when publish_generated_columns is set to false.

The above comment (still present in the file) still refers to the
boolean values of the option.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-01-15 06:07:43 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Ashutosh Bapat 2025-01-15 05:46:02 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart