Re: Pgoutput not capturing the generated columns

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-11-04 14:58:53
Message-ID: CALDaNm2sNfZoFfqOKq9GAjQZd3isqosij9iHaJjn7oQVmLLNYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 1 Nov 2024 at 07:10, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Oct 31, 2024 at 3:16 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > Thanks for committing this patch, here is a rebased version of the
> > remaining patches.
> >
>
> Hi Vignesh, thanks for the rebased patches.
>
> Here are my review comments for patch v1-0001.
>
> ======
> Commit message.
>
> 1.
> The commit message text is stale, so needs some updates.
>
> For example, it is still saying "Generated column values are not
> currently replicated..." but that is not correct anymore since the
> recent push of the previous v46-0001 patch [1], which already
> implemented replication of generated columns when they are specified
> in a publication column list..

Modified

> ======
> doc/src/sgml/ddl.sgml
>
> 2.
> <para>
> - Generated columns are skipped for logical replication and cannot be
> - specified in a <command>CREATE PUBLICATION</command> column list.
> + Generated columns are allowed to be replicated during logical replication
> + according to the <command>CREATE PUBLICATION</command> option
> + <link linkend="sql-createpublication-params-with-publish-generated-columns">
> + <literal>include_generated_columns</literal></link>.
> </para>
>
> This explanation is incomplete because generated columns can also be
> specified in a publication column list which has nothing to do with
> the new option. In fact, lack of mention about the column lists seems
> like an oversight which should have been in the previous patch [1]. I
> already posted another mail about this [2].

Modified

> ======
> doc/src/sgml/protocol.sgml
>
> 3.
> <para>
> - Next, one of the following submessages appears for each column:
> + Next, one of the following submessages appears for each column
> (except generated columns):
>
> Hmm. Now that generated column replication is supported is this change
> still required?

Modified

> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 4.
> +
> + <varlistentry
> id="sql-createpublication-params-with-publish-generated-columns">
> + <term><literal>publish_generated_columns</literal>
> (<type>boolean</type>)</term>
> + <listitem>
> + <para>
> + Specifies whether the generated columns present in the tables
> + associated with the publication should be replicated.
> + The default is <literal>false</literal>.
> + </para>
> + </listitem>
> + </varlistentry>
> +
>
> I know that the subsequent DOCS patch V1-0002 will explain more about
> this, but as a stand-alone patch 0001 maybe you need to clarify that a
> publication column list will override this 'publish_generated_columns'
> parameter.

I felt it is better to keep it in 0002 patch itself.

>
> ======
> src/backend/catalog/pg_publication.c
>
> has_column_list_defined:
>
> 5.
> +/*
> + * Returns true if the relation has column list associated with the
> publication,
> + * false otherwise.
> + */
> +bool
> +has_column_list_defined(Publication *pub, Oid relid)
> +{
> + HeapTuple cftuple = NULL;
> + bool isnull = true;
> +
> + if (pub->alltables)
> + return false;
> +
> + cftuple = SearchSysCache2(PUBLICATIONRELMAP,
> + ObjectIdGetDatum(relid),
> + ObjectIdGetDatum(pub->oid));
> + if (HeapTupleIsValid(cftuple))
> + {
> + /* Lookup the column list attribute. */
> + (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
> + Anum_pg_publication_rel_prattrs,
> + &isnull);
> + if (!isnull)
> + {
> + ReleaseSysCache(cftuple);
> + return true;
> + }
> +
> + ReleaseSysCache(cftuple);
> + }
> +
> + return false;
> +}
> +
>
> 5a.
> It might be tidier if you checked for !HeapTupleIsValid(cftuple) and
> do early return false, instead of needing an indented if-block.

I preferred the existing, I did not see any advantage by doing this.

> ~
>
> 5b.
> The code can be rearranged and simplified -- you don't need multiple
> calls to ReleaseSysCache.
>
> SUGGESTION:
>
> /* Lookup the column list attribute. */
> (void) SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
> Anum_pg_publication_rel_prattrs,
> &isnull);
>
> ReleaseSysCache(cftuple);
> /* Was a column list found? */
> return isnull ? false : true;

Modified

> ~~~
>
> pub_getallcol_bitmapset:
>
> 6.
> +/*
> + * Return a column list bitmap for the specified table.
> + *
> + * Generated columns are included if pubgencols is true.
> + *
> + * If mcxt isn't NULL, build the bitmapset in that context.
> + */
> +Bitmapset *
> +pub_getallcol_bitmapset(Relation relation, bool pubgencols,
> + MemoryContext mcxt)
>
> IIUC this is a BMS of the table columns to be published. The function
> comment seems confusing to me when it says "column list bitmap"
> because IIUC this function is not really anything to do with a
> publication "column list", which is an entirely different thing.

Changed the function name and updated comments based on Amit's
suggestion from [1]

> ======
> src/backend/replication/logical/proto.c
>
> 7.
> static void logicalrep_write_attrs(StringInfo out, Relation rel,
> - Bitmapset *columns);
> + Bitmapset *columns, bool pubgencols);
> static void logicalrep_write_tuple(StringInfo out, Relation rel,
> TupleTableSlot *slot,
> - bool binary, Bitmapset *columns);
> + bool binary, Bitmapset *columns,
> + bool pubgencols);
>
> The meaning of all these new 'pubgencols' are ambiguous. e.g. Are they
> (a) The value of the CREATE PUBLICATION 'publish_generate_columns'
> parameter, or does it mean (b) Just some generated column is being
> published (maybe via column list or maybe not).
>
> I think it means (a) but, if true, that could be made much more clear
> by changing all of these names to 'pubgencols_option' or something
> similar. Actually, now I have doubts about that also -- I think this
> might be magically assigned to false if no generated columns exist in
> the table. Anyway, please do whatever you can to disambiguate this.

Changed it to include_gencols based on Amit's suggestion from [1]

> ~~~
>
> logicalrep_should_publish_column:
>
> 8.
> The function comment is stale. It is still only talking about
> generated columns in column lists.
>
> SUGGESTION
> Note that generated columns can be published only when present in a
> publication column list, or (if there is no column list), when the
> publication parameter 'publish_generated_columns' is true.

Modified

> ~~~
>
> 9.
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols)
> {
> if (att->attisdropped)
> return false;
>
> /*
> * Skip publishing generated columns if they are not included in the
> * column list or if the option is not specified.
> */
> if (!columns && !pubgencols && att->attgenerated)
> return false;
>
> /*
> * Check if a column is covered by a column list.
> */
> if (columns && !bms_is_member(att->attnum, columns))
> return false;
>
> return true;
> }
>
> Same as mentioned before in my previous v46-0001 review comments, I
> feel that the conditionals of this function are over-complicated and
> that there are more 'return' points than necessary. The alternative
> code below looks simpler to me.
>
> SUGGESTION
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols_option)
> {
> if (att->attisdropped)
> return false;
>
> if (columns)
> {
> /*
> * Has a column list:
> * Publish only cols named in that list.
> */
> return bms_is_member(att->attnum, columns);
> }
> else
> {
> /*
> * Has no column list:
> * Publish generated cols only if 'publish_generated_cols' is true.
> * Publish all non-generated cols.
> */
> return att->attgenerated ? pubgencols_option : true;
> }
> }

Modified

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 10.
> + /* Include publishing generated columns */
> + bool pubgencols;
> +
>
> There is similar ambiguity here with this field-name as was mentioned
> about for other 'pbgencols' function params. I had initially thought
> that this this just caries around same value as the publication option
> 'publish_generated_columns' but now (after looking at function
> check_and_init_gencol) I think that might not be the case because I
> saw it can be assigned false (overriding the publication option?).
>
> Anyway, the comment needs to be made much clearer about what is the
> true meaning of this field. Or, rename it if there is a better name.

Changed the variable to avoid confusion.

> ~~~
>
> 11.
> +static void send_relation_and_attrs(Relation relation, TransactionId xid,
> + LogicalDecodingContext *ctx,
> + RelationSyncEntry *relentry);
>
> Was there some reason to move this static? Maybe it is better just to
> change the existing static in-place rather than moving code around at
> the same time.

This function now requires the RelationSyncEntry parameter to publish
generated columns. Since this structure was defined after the previous
function prototype, it has been moved below the prototype.

> ~~~
>
> send_relation_and_attrs:
>
> 12.
> - if (!logicalrep_should_publish_column(att, columns))
> + if (!logicalrep_should_publish_column(att, columns, relentry->pubgencols))
> continue;
> It seemed a bit strange/inconsistent that 'columns' was assigned to a
> local var, but 'pubgencols' was not, given they are both fields of the
> same struct. Maybe removing this 'columns' var would be consistent
> with other code in this patch.

Modified to use a local variable.

> ~~~
>
> 13.
> check_and_init_gencol:
>
> nit - missing periods for comments.

Modified

> ~~~
>
> 14.
> + /* There is no generated columns to be published *
>
> /There is no generated columns/There are no generated columns/

Modified

> ~~~
>
> 15.
> + foreach(lc, publications)
> + {
> + Publication *pub = lfirst(lc);
>
> AFAIK this can be re-written using a different macro to avoid needing
> the 'lc' var.

Modified

> ~~~
>
> pgoutput_column_list_init:
>
> 16.
> + bool collistpubexist = false;
>
> This seemed like not a very good name, How about 'found_pub_with_collist';

Modified

> ~~~
>
> 17.
> bool pub_no_list = true;
>
> nit - Not caused by this patch, but it's closely related; In passing
> we should declare this variable at a lower scope, and rename it to
> 'isnull' which is more in keeping with the comments around it.

I was not sure if this should be done in this patch, there was a
similar opinion from Amit also at [1].

> ~~~
>
> 18.
> + /*
> + * For non-column list publications—such as TABLE (without a column
> + * list), ALL TABLES, or ALL TABLES IN SCHEMA publications consider
> + * all columns of the table, including generated columns, based on the
> + * pubgencols option.
> + */
>
> Some minor tweaks.
>
> SUGGESTION
> For non-column list publications — e.g. TABLE (without a column list),
> ALL TABLES, or ALL TABLES IN SCHEMA, we consider all columns of the
> table (including generated columns when 'publish_generated_columns'
> option is true).

Modified

> ~~~
>
> 19.
> + Assert(pub->pubgencols == entry->pubgencols);
> +
> + /*
> + * Retrieve the columns if they haven't been prepared yet, or if
> + * there are multiple publications.
> + */
> + if (!relcols && (list_length(publications) > 1))
> + {
> + pgoutput_ensure_entry_cxt(data, entry);
> + relcols = pub_getallcol_bitmapset(relation, entry->pubgencols,
> + entry->entry_cxt);
> + }
>
> 19a.
> Is that Assert correct? I ask only because AFAICT I saw in previous
> function (check_and_init_gencol) there is code that might change
> entry->pubgencols = false; even if the 'publish_generated_columns'
> option is true but there were not generated columns found in the
> table.

Removed Assert as we don't set the entry->pubgencols to the same value
as pubgencols when the table has no generated columns.

> ~
>
> 19b.
> The comment says "or if there are multiple publications" but the code
> says &&. Something seems wrong.

Modified the comment

> ~~~
>
> 20.
> + /*
> + * If no column list publications exit, columns will be selected later
> + * according to the generated columns option.
> + */
>
> 20a.
> typo - /exit/exist/

Modified

> ~
>
> 20b.
> There is a GENERAL problem that applies for lots of comments of this
> patch (including this comment) because the new publication option is
> referred to inconsistently in many different ways:
>
> e.g.
> - the generated columns option.
> - if the option is not specified
> - publish_generated_columns option.
> - the pubgencols option
> - 'publish_generated_columns' option
>
> All these references should be made the same. My personal preference
> is the last one ('publish_generated_columns' option).

Modified

> ~~~
>
> get_rel_sync_entry:
>
> 21.
> + /* Check whether to publish to generated columns. */
> + check_and_init_gencol(data, rel_publications, entry);
> +
>
> typo in comment - "publishe to"?

Modified

> ======
> src/include/catalog/pg_publication.h
>
> 22.
> extern Bitmapset *pub_collist_to_bitmapset(Bitmapset *columns, Datum pubcols,
> MemoryContext mcxt);
> +extern Bitmapset *pub_getallcol_bitmapset(Relation relation, bool pubgencols,
> + MemoryContext mcxt);
>
> Maybe a better name for this new function is 'pub_allcols_bitmapse'.
> That would then be consistent with the other BMS function which
> doesn't include the word "get".

Changed it to pub_form_cols_map

The attached v47 version patch has the changes for the same.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2B%2Bae6KqKv35b%2BQODPn1PxuiGCnL-B_m4T678XfLX0qXw%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
v47-0002-DOCS-Generated-Column-Replication.patch text/x-patch 12.6 KB
v47-0001-Enable-support-for-publish_generated_columns-opt.patch text/x-patch 99.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-11-04 15:11:55 Re: Pgoutput not capturing the generated columns
Previous Message Aleksander Alekseev 2024-11-04 14:46:46 Re: New function normal_rand_array function to contrib/tablefunc.