Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-11-01 01:40:23
Message-ID: CAHut+Pt2gW6fNLn0PWFmk3DtDnv9pT--geENe51_+9R7_V6GzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> Thanks for committing this patch, here is a rebased version of the
> remaining patches.
>

Hi Vignesh, thanks for the rebased patches.

Here are my review comments for patch v1-0001.

======
Commit message.

1.
The commit message text is stale, so needs some updates.

For example, it is still saying "Generated column values are not
currently replicated..." but that is not correct anymore since the
recent push of the previous v46-0001 patch [1], which already
implemented replication of generated columns when they are specified
in a publication column list..

======
doc/src/sgml/ddl.sgml

2.
<para>
- Generated columns are skipped for logical replication and cannot be
- specified in a <command>CREATE PUBLICATION</command> column list.
+ Generated columns are allowed to be replicated during logical replication
+ according to the <command>CREATE PUBLICATION</command> option
+ <link linkend="sql-createpublication-params-with-publish-generated-columns">
+ <literal>include_generated_columns</literal></link>.
</para>

This explanation is incomplete because generated columns can also be
specified in a publication column list which has nothing to do with
the new option. In fact, lack of mention about the column lists seems
like an oversight which should have been in the previous patch [1]. I
already posted another mail about this [2].

======
doc/src/sgml/protocol.sgml

3.
<para>
- Next, one of the following submessages appears for each column:
+ Next, one of the following submessages appears for each column
(except generated columns):

Hmm. Now that generated column replication is supported is this change
still required?

======
doc/src/sgml/ref/create_publication.sgml

4.
+
+ <varlistentry
id="sql-createpublication-params-with-publish-generated-columns">
+ <term><literal>publish_generated_columns</literal>
(<type>boolean</type>)</term>
+ <listitem>
+ <para>
+ Specifies whether the generated columns present in the tables
+ associated with the publication should be replicated.
+ The default is <literal>false</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+

I know that the subsequent DOCS patch V1-0002 will explain more about
this, but as a stand-alone patch 0001 maybe you need to clarify that a
publication column list will override this 'publish_generated_columns'
parameter.

======
src/backend/catalog/pg_publication.c

has_column_list_defined:

5.
+/*
+ * Returns true if the relation has column list associated with the
publication,
+ * false otherwise.
+ */
+bool
+has_column_list_defined(Publication *pub, Oid relid)
+{
+ HeapTuple cftuple = NULL;
+ bool isnull = true;
+
+ if (pub->alltables)
+ return false;
+
+ cftuple = SearchSysCache2(PUBLICATIONRELMAP,
+ ObjectIdGetDatum(relid),
+ ObjectIdGetDatum(pub->oid));
+ if (HeapTupleIsValid(cftuple))
+ {
+ /* Lookup the column list attribute. */
+ (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
+ Anum_pg_publication_rel_prattrs,
+ &isnull);
+ if (!isnull)
+ {
+ ReleaseSysCache(cftuple);
+ return true;
+ }
+
+ ReleaseSysCache(cftuple);
+ }
+
+ return false;
+}
+

5a.
It might be tidier if you checked for !HeapTupleIsValid(cftuple) and
do early return false, instead of needing an indented if-block.

~

5b.
The code can be rearranged and simplified -- you don't need multiple
calls to ReleaseSysCache.

SUGGESTION:

/* Lookup the column list attribute. */
(void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
Anum_pg_publication_rel_prattrs,
&isnull);

ReleaseSysCache(cftuple);
/* Was a column list found? */
return isnull ? false : true;

~~~

pub_getallcol_bitmapset:

6.
+/*
+ * Return a column list bitmap for the specified table.
+ *
+ * Generated columns are included if pubgencols is true.
+ *
+ * If mcxt isn't NULL, build the bitmapset in that context.
+ */
+Bitmapset *
+pub_getallcol_bitmapset(Relation relation, bool pubgencols,
+ MemoryContext mcxt)

IIUC this is a BMS of the table columns to be published. The function
comment seems confusing to me when it says "column list bitmap"
because IIUC this function is not really anything to do with a
publication "column list", which is an entirely different thing.

======
src/backend/replication/logical/proto.c

7.
static void logicalrep_write_attrs(StringInfo out, Relation rel,
- Bitmapset *columns);
+ Bitmapset *columns, bool pubgencols);
static void logicalrep_write_tuple(StringInfo out, Relation rel,
TupleTableSlot *slot,
- bool binary, Bitmapset *columns);
+ bool binary, Bitmapset *columns,
+ bool pubgencols);

The meaning of all these new 'pubgencols' are ambiguous. e.g. Are they
(a) The value of the CREATE PUBLICATION 'publish_generate_columns'
parameter, or does it mean (b) Just some generated column is being
published (maybe via column list or maybe not).

I think it means (a) but, if true, that could be made much more clear
by changing all of these names to 'pubgencols_option' or something
similar. Actually, now I have doubts about that also -- I think this
might be magically assigned to false if no generated columns exist in
the table. Anyway, please do whatever you can to disambiguate this.

~~~

logicalrep_should_publish_column:

8.
The function comment is stale. It is still only talking about
generated columns in column lists.

SUGGESTION
Note that generated columns can be published only when present in a
publication column list, or (if there is no column list), when the
publication parameter 'publish_generated_columns' is true.

~~~

9.
bool
logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
bool pubgencols)
{
if (att->attisdropped)
return false;

/*
* Skip publishing generated columns if they are not included in the
* column list or if the option is not specified.
*/
if (!columns && !pubgencols && att->attgenerated)
return false;

/*
* Check if a column is covered by a column list.
*/
if (columns && !bms_is_member(att->attnum, columns))
return false;

return true;
}

Same as mentioned before in my previous v46-0001 review comments, I
feel that the conditionals of this function are over-complicated and
that there are more 'return' points than necessary. The alternative
code below looks simpler to me.

SUGGESTION
bool
logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
bool pubgencols_option)
{
if (att->attisdropped)
return false;

if (columns)
{
/*
* Has a column list:
* Publish only cols named in that list.
*/
return bms_is_member(att->attnum, columns);
}
else
{
/*
* Has no column list:
* Publish generated cols only if 'publish_generated_cols' is true.
* Publish all non-generated cols.
*/
return att->attgenerated ? pubgencols_option : true;
}
}

======
src/backend/replication/pgoutput/pgoutput.c

10.
+ /* Include publishing generated columns */
+ bool pubgencols;
+

There is similar ambiguity here with this field-name as was mentioned
about for other 'pbgencols' function params. I had initially thought
that this this just caries around same value as the publication option
'publish_generated_columns' but now (after looking at function
check_and_init_gencol) I think that might not be the case because I
saw it can be assigned false (overriding the publication option?).

Anyway, the comment needs to be made much clearer about what is the
true meaning of this field. Or, rename it if there is a better name.

~~~

11.
+static void send_relation_and_attrs(Relation relation, TransactionId xid,
+ LogicalDecodingContext *ctx,
+ RelationSyncEntry *relentry);

Was there some reason to move this static? Maybe it is better just to
change the existing static in-place rather than moving code around at
the same time.

~~~

send_relation_and_attrs:

12.
- if (!logicalrep_should_publish_column(att, columns))
+ if (!logicalrep_should_publish_column(att, columns, relentry->pubgencols))
continue;
It seemed a bit strange/inconsistent that 'columns' was assigned to a
local var, but 'pubgencols' was not, given they are both fields of the
same struct. Maybe removing this 'columns' var would be consistent
with other code in this patch.

~~~

13.
check_and_init_gencol:

nit - missing periods for comments.

~~~

14.
+ /* There is no generated columns to be published *

/There is no generated columns/There are no generated columns/

~~~

15.
+ foreach(lc, publications)
+ {
+ Publication *pub = lfirst(lc);

AFAIK this can be re-written using a different macro to avoid needing
the 'lc' var.

~~~

pgoutput_column_list_init:

16.
+ bool collistpubexist = false;

This seemed like not a very good name, How about 'found_pub_with_collist';

~~~

17.
bool pub_no_list = true;

nit - Not caused by this patch, but it's closely related; In passing
we should declare this variable at a lower scope, and rename it to
'isnull' which is more in keeping with the comments around it.

~~~

18.
+ /*
+ * For non-column list publications—such as TABLE (without a column
+ * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider
+ * all columns of the table, including generated columns, based on the
+ * pubgencols option.
+ */

Some minor tweaks.

SUGGESTION
For non-column list publications — e.g. TABLE (without a column list),
ALL TABLES, or ALL TABLES IN SCHEMA, we consider all columns of the
table (including generated columns when 'publish_generated_columns'
option is true).

~~~

19.
+ Assert(pub->pubgencols == entry->pubgencols);
+
+ /*
+ * Retrieve the columns if they haven't been prepared yet, or if
+ * there are multiple publications.
+ */
+ if (!relcols && (list_length(publications) > 1))
+ {
+ pgoutput_ensure_entry_cxt(data, entry);
+ relcols = pub_getallcol_bitmapset(relation, entry->pubgencols,
+ entry->entry_cxt);
+ }

19a.
Is that Assert correct? I ask only because AFAICT I saw in previous
function (check_and_init_gencol) there is code that might change
entry->pubgencols = false; even if the 'publish_generated_columns'
option is true but there were not generated columns found in the
table.

~

19b.
The comment says "or if there are multiple publications" but the code
says &&. Something seems wrong.

~~~

20.
+ /*
+ * If no column list publications exit, columns will be selected later
+ * according to the generated columns option.
+ */

20a.
typo - /exit/exist/

~

20b.
There is a GENERAL problem that applies for lots of comments of this
patch (including this comment) because the new publication option is
referred to inconsistently in many different ways:

e.g.
- the generated columns option.
- if the option is not specified
- publish_generated_columns option.
- the pubgencols option
- 'publish_generated_columns' option

All these references should be made the same. My personal preference
is the last one ('publish_generated_columns' option).

~~~

get_rel_sync_entry:

21.
+ /* Check whether to publish to generated columns. */
+ check_and_init_gencol(data, rel_publications, entry);
+

typo in comment - "publishe to"?

======
src/include/catalog/pg_publication.h

22.
extern Bitmapset *pub_collist_to_bitmapset(Bitmapset *columns, Datum pubcols,
MemoryContext mcxt);
+extern Bitmapset *pub_getallcol_bitmapset(Relation relation, bool pubgencols,
+ MemoryContext mcxt);

Maybe a better name for this new function is 'pub_allcols_bitmapse'.
That would then be consistent with the other BMS function which
doesn't include the word "get".

======
Some of my suggested updates above are already implemented in the
attached nitpicks diff. Please refer to it.

======
[1] push support for gencols in column list --
https://github.com/postgres/postgres/commit/745217a051a9341e8c577ea59a87665d331d4af0
[2] docs oversight in commit --
https://www.postgresql.org/message-id/CAHut%2BPtBVSxNDph-mHP_SE4%2BWw%2BxJ0SKhVupnx5uVhK3V_SDHw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_GENCOLS_V10001.txt text/plain 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-11-01 01:49:34 Re: Consider the number of columns in the sort cost model
Previous Message Michael Paquier 2024-11-01 01:34:18 Re: [PATCH] Add array_reverse() function