From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com |
Subject: | Re: Pgoutput not capturing the generated columns |
Date: | 2024-11-05 05:47:02 |
Message-ID: | CALDaNm3Ha5t9bOLJ7OBnaMRgYHX_Q4j9k3EbRsX=+1mxUo5BZw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 5 Nov 2024 at 07:00, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Vignesh,
>
> Here are my review comments for your latest patch v47-0001.
>
> ======
> doc/src/sgml/ddl.sgml
>
> 1.
> <para>
> - Generated columns can be replicated during logical replication by
> - including them in the column list of the
> - <command>CREATE PUBLICATION</command> command.
> + Generated columns are allowed to be replicated during logical replication
> + according to the <command>CREATE PUBLICATION</command> option
> + <link linkend="sql-createpublication-params-with-publish-generated-columns">
> + <literal>include_generated_columns</literal></link> or by including them
> + in the column list of the <command>CREATE PUBLICATION</command> command.
> </para>
>
> 1a.
> This text gives the wrong name for the new parameter.
> /include_generated_columns/publish_generated_columns/
>
> ~
>
> 1b.
> Everywhere in this patch (except here), this is called the
> 'publish_generated_columns' parameter (not "option") so it should be
> called a parameter here also. Anyway, apparently that is the docs rule
> -- see [1].
>
> BTW, the same applies for the commit message 1st line of this patch:
> [PATCH v47 1/2] Enable support for 'publish_generated_columns' option.
> Should be
> [PATCH v47 1/2] Enable support for 'publish_generated_columns' parameter.
Modified to keep it consistent
> ======
> doc/src/sgml/protocol.sgml
>
> 2.
> - Next, one of the following submessages appears for each column:
> + Next, one of the following submessages appears for each published column:
>
> The change is OK. But, note that there are other descriptions just
> like this one on the same page, so if you are going to say "published"
> here, then to be consistent you probably want to consider updating the
> other places as well.
This can be done later after this patch is committed
> ======
> src/backend/catalog/pg_publication.c
>
> 3.
> +bool
> +has_column_list_defined(Publication *pub, Oid relid)
> +{
> + HeapTuple cftuple = NULL;
> + bool isnull = true;
>
> Since you chose not to rearrange the HeapTupleIsValid check, this
> 'isnull' declaration should be relocated within the if-block.
Modified
> ======
> src/backend/replication/logical/proto.c
>
> 4.
> /*
> * Check if the column 'att' of a table should be published.
> *
> - * 'columns' represents the column list specified for that table in the
> - * publication.
> + * 'columns' represents the publication column list (if any) for that table.
> *
> - * Note that generated columns can be present only in 'columns' list.
> + * Note that generated columns can be published only when present in a
> + * publication column list, or when include_gencols is true.
> */
> bool
> -logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +logicalrep_should_publish_column(Form_pg_attribute att, Bitmapset *columns,
> + bool include_gencols)
>
> The function comment describes 'columns' but it doesn't describe
> 'include_gencols'. I think knowing more about that parameter would be
> helpful.
>
> SUGGESTION:
> The 'include_gencols' flag indicates whether generated columns should
> be published when there is no column list. Typically, this will have
> the same value as the 'publish_generated_columns' publication
> parameter.
Modified
> ======
> src/backend/replication/logical/relation.c
>
> 5.
> @@ -421,7 +421,7 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
> LOCKMODE lockmode)
> int attnum;
> Form_pg_attribute attr = TupleDescAttr(desc, i);
>
> - if (attr->attisdropped || attr->attgenerated)
> + if (attr->attisdropped)
> {
> entry->attrmap->attnums[i] = -1;
> continue;
> @@ -432,7 +432,15 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
> LOCKMODE lockmode)
>
> entry->attrmap->attnums[i] = attnum;
> if (attnum >= 0)
> + {
> + if (attr->attgenerated)
> + ereport(ERROR,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replicating to a target relation's generated column \"%s\"
> for \"%s.%s\" is not supported",
> + NameStr(attr->attname), remoterel->nspname, remoterel->relname));
> +
> missingatts = bms_del_member(missingatts, attnum);
> + }
>
> Hmm. I think this more descriptive error is a good improvement over
> the previous "missing" error, but I just don't think it belongs in
> this patch. This is impacting the existing "regular" ==> "generated"
> replication as well, which seems out-of-scope for this gencols patch.
>
> IMO this ought to be made as a separate patch that can be pushed to
> master separately/independently *before* any of this new gencols
> stuff.
>
> Also, you already said in the commit message:
> * Publisher not-generated column => subscriber generated column:
> This will give ERROR (not changed by this patch).
>
> So the "not changed by this patch" part is not true if these changes
> are included.
That makes sense, I will post a separate patch for this after this work is done.
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 6.
> + /*
> + * Include publishing generated columns if 'publish_generated_columns'
> + * parameter is set to true, this will be set only if the relation
> + * contains any generated column.
> + */
> + bool include_gencols;
> +
>
> Minor rewording.
>
> SUGGESTION:
> Include generated columns for publication is set true if
> 'publish_generated_columns' parameter is true, and the relation
> contains generated columns.
Modified
> ~~~
>
> 7.
> + /*
> + * Retrieve the columns if they haven't been prepared yet, and
> + * only if multiple publications exist.
> + */
> + if (!relcols && (list_length(publications) > 1))
> + {
> + pgoutput_ensure_entry_cxt(data, entry);
> + relcols = pub_form_cols_map(relation, entry->include_gencols,
> + entry->entry_cxt);
> + }
>
> IIUC the purpose of this is for ensuring that the column lists are
> consistent across all publications. That is why we only do this when
> there are > 1 publications. For the 1st publication with no column
> list we cache all the columns (in 'relcols') so later the cols of the
> *current* publication (in 'cols') can be checked to see if they are
> the same.
>
> TBH, I think this part needs to have more explanation because it's a
> bit too subtle; you have to read between the lines to figure out what
> it is doing instead of just having a comment to clearly describe the
> logic up-front.
Added comments.
The attached v48 version has the changes for the same.
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
v48-0002-DOCS-Generated-Column-Replication.patch | text/x-patch | 12.7 KB |
v48-0001-Enable-support-for-publish_generated_columns-par.patch | text/x-patch | 98.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-11-05 05:48:41 | Re: Pgoutput not capturing the generated columns |
Previous Message | Amit Kapila | 2024-11-05 05:13:05 | Re: Pgoutput not capturing the generated columns |