From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com |
Subject: | Re: Pgoutput not capturing the generated columns |
Date: | 2024-11-04 10:01:09 |
Message-ID: | CAA4eK1++ae6KqKv35b+QODPn1PxuiGCnL-B_m4T678XfLX0qXw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 1, 2024 at 7:10 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
>
> ======
> doc/src/sgml/protocol.sgml
>
> 3.
> <para>
> - Next, one of the following submessages appears for each column:
> + Next, one of the following submessages appears for each column
> (except generated columns):
>
> Hmm. Now that generated column replication is supported is this change
> still required?
>
How about changing it to: "Next, one of the following submessages
appears for each published column:"? This is because the column may
not be sent because either it is not in the column list or a generated
one (with publish_generated_columns as false for respective
publication).
> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 4.
> +
> + <varlistentry
> id="sql-createpublication-params-with-publish-generated-columns">
> + <term><literal>publish_generated_columns</literal>
> (<type>boolean</type>)</term>
> + <listitem>
> + <para>
> + Specifies whether the generated columns present in the tables
> + associated with the publication should be replicated.
> + The default is <literal>false</literal>.
> + </para>
> + </listitem>
> + </varlistentry>
> +
>
> I know that the subsequent DOCS patch V1-0002 will explain more about
> this, but as a stand-alone patch 0001 maybe you need to clarify that a
> publication column list will override this 'publish_generated_columns'
> parameter.
>
It is better to leave it to 0002 patch. But note in that patch, we
should add some reference link for the column_list behavior in the
create publication page as well.
> ======
> src/backend/catalog/pg_publication.c
>
>
> pub_getallcol_bitmapset:
>
> 6.
> +/*
> + * Return a column list bitmap for the specified table.
> + *
> + * Generated columns are included if pubgencols is true.
> + *
> + * If mcxt isn't NULL, build the bitmapset in that context.
> + */
> +Bitmapset *
> +pub_getallcol_bitmapset(Relation relation, bool pubgencols,
> + MemoryContext mcxt)
>
> IIUC this is a BMS of the table columns to be published. The function
> comment seems confusing to me when it says "column list bitmap"
> because IIUC this function is not really anything to do with a
> publication "column list", which is an entirely different thing.
>
We can probably name it pub_form_cols_map() and change the comments accordingly.
> ======
> src/backend/replication/logical/proto.c
>
> 7.
> static void logicalrep_write_attrs(StringInfo out, Relation rel,
> - Bitmapset *columns);
> + Bitmapset *columns, bool pubgencols);
> static void logicalrep_write_tuple(StringInfo out, Relation rel,
> TupleTableSlot *slot,
> - bool binary, Bitmapset *columns);
> + bool binary, Bitmapset *columns,
> + bool pubgencols);
>
> The meaning of all these new 'pubgencols' are ambiguous. e.g. Are they
> (a) The value of the CREATE PUBLICATION 'publish_generate_columns'
> parameter, or does it mean (b) Just some generated column is being
> published (maybe via column list or maybe not).
>
> I think it means (a) but, if true, that could be made much more clear
> by changing all of these names to 'pubgencols_option' or something
> similar. Actually, now I have doubts about that also -- I think this
> might be magically assigned to false if no generated columns exist in
> the table. Anyway, please do whatever you can to disambiguate this.
>
To make it clear we can name this parameter as include_gencols.
Similarly, change the name of RelationSyncEntry's new member.
> ~~~
>
> 9.
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols)
> {
> if (att->attisdropped)
> return false;
>
> /*
> * Skip publishing generated columns if they are not included in the
> * column list or if the option is not specified.
> */
> if (!columns && !pubgencols && att->attgenerated)
> return false;
>
> /*
> * Check if a column is covered by a column list.
> */
> if (columns && !bms_is_member(att->attnum, columns))
> return false;
>
> return true;
> }
>
> Same as mentioned before in my previous v46-0001 review comments, I
> feel that the conditionals of this function are over-complicated and
> that there are more 'return' points than necessary. The alternative
> code below looks simpler to me.
>
> SUGGESTION
> bool
> logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> bool pubgencols_option)
> {
> if (att->attisdropped)
> return false;
>
> if (columns)
> {
> /*
> * Has a column list:
> * Publish only cols named in that list.
> */
> return bms_is_member(att->attnum, columns);
> }
> else
> {
> /*
> * Has no column list:
> * Publish generated cols only if 'publish_generated_cols' is true.
> * Publish all non-generated cols.
> */
> return att->attgenerated ? pubgencols_option : true;
> }
> }
>
Fair enough but do we need else in the above code?
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 10.
> + /* Include publishing generated columns */
> + bool pubgencols;
> +
>
> There is similar ambiguity here with this field-name as was mentioned
> about for other 'pbgencols' function params. I had initially thought
> that this this just caries around same value as the publication option
> 'publish_generated_columns' but now (after looking at function
> check_and_init_gencol) I think that might not be the case because I
> saw it can be assigned false (overriding the publication option?).
>
> Anyway, the comment needs to be made much clearer about what is the
> true meaning of this field. Or, rename it if there is a better name.
>
As suggested above, we can name it as include_gencols.
>
> send_relation_and_attrs:
>
> 12.
> - if (!logicalrep_should_publish_column(att, columns))
> + if (!logicalrep_should_publish_column(att, columns, relentry->pubgencols))
> continue;
> It seemed a bit strange/inconsistent that 'columns' was assigned to a
> local var, but 'pubgencols' was not, given they are both fields of the
> same struct. Maybe removing this 'columns' var would be consistent
> with other code in this patch.
>
I think the other way would be better. I mean take another local
variable for this function. We don't need to always do the same in
such cases.
> ~~~
>
> 13.
> check_and_init_gencol:
>
> nit - missing periods for comments.
>
> ~~~
>
> 14.
> + /* There is no generated columns to be published *
>
> /There is no generated columns/There are no generated columns/
>
> ~~~
>
> 15.
> + foreach(lc, publications)
> + {
> + Publication *pub = lfirst(lc);
>
> AFAIK this can be re-written using a different macro to avoid needing
> the 'lc' var.
>
> ~~~
>
> pgoutput_column_list_init:
>
> 16.
> + bool collistpubexist = false;
>
> This seemed like not a very good name, How about 'found_pub_with_collist';
>
> ~~~
>
> 17.
> bool pub_no_list = true;
>
> nit - Not caused by this patch, but it's closely related; In passing
> we should declare this variable at a lower scope, and rename it to
> 'isnull' which is more in keeping with the comments around it.
>
Moving to local scope is okay but doing more than that in this patch
is not advisable even if your suggestion is a good idea which I am not
sure.
> ~
>
> 20b.
> There is a GENERAL problem that applies for lots of comments of this
> patch (including this comment) because the new publication option is
> referred to inconsistently in many different ways:
>
> e.g.
> - the generated columns option.
> - if the option is not specified
> - publish_generated_columns option.
> - the pubgencols option
> - 'publish_generated_columns' option
>
> All these references should be made the same. My personal preference
> is the last one ('publish_generated_columns' option).
>
I have responded with a better name for other places. Here, the
proposed name seems okay to me.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-11-04 10:01:50 | Re: per backend I/O statistics |
Previous Message | Bertrand Drouvot | 2024-11-04 09:50:05 | Re: define pg_structiszero(addr, s, r) |