Re: Pgoutput not capturing the generated columns

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-21 05:21:13
Message-ID: CALDaNm0qzxiastyGOMVmxPhRYyM7ET8fkdyVBnDS6QHz2YrbdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 18 Oct 2024 at 17:42, Shubham Khanna
<khannashubham1197(at)gmail(dot)com> wrote:
>
> On Thu, Oct 17, 2024 at 12:58 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
> > <khannashubham1197(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Oct 9, 2024 at 9:08 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna <khannashubham1197(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > Hi Shubham, here are my review comments for v36-0001.
> > > > > >
> > > > > > ======
> > > > > > 1. General - merge patches
> > > > > >
> > > > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > > > AFAIK the split was only because historically these parts had
> > > > > > different authors. But, keeping them separated is not helpful anymore.
> > > > > >
> > > > > > ======
> > > > > > src/backend/catalog/pg_publication.c
> > > > > >
> > > > > > 2.
> > > > > > Bitmapset *
> > > > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > > > > >
> > > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > > > so it should also be removed.
> > > > > >
> > > > > > ======
> > > > > > src/backend/replication/pgoutput/pgoutput.c
> > > > > >
> > > > > > 3.
> > > > > > /*
> > > > > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > > > > - * if there are no column lists (even if other publications have a
> > > > > > - * list).
> > > > > > + * To handle cases where the publish_generated_columns option isn't
> > > > > > + * specified for all tables in a publication, we must create a column
> > > > > > + * list that excludes generated columns. So, the publisher will not
> > > > > > + * replicate the generated columns.
> > > > > > */
> > > > > > - if (!pub->alltables)
> > > > > > + if (!(pub->alltables && pub->pubgencols))
> > > > > >
> > > > > > I still found that comment hard to understand. Does this mean to say
> > > > > > something like:
> > > > > >
> > > > > > ------
> > > > > > Process potential column lists for the following cases:
> > > > > >
> > > > > > a. Any publication that is not FOR ALL TABLES.
> > > > > >
> > > > > > b. When the publication is FOR ALL TABLES and
> > > > > > 'publish_generated_columns' is false.
> > > > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > > > so all columns will be replicated by default. However, if
> > > > > > 'publish_generated_columns' is set to false, column lists must still
> > > > > > be created to exclude any generated columns from being published
> > > > > > ------
> > > > > >
> > > > > > ======
> > > > > > src/test/regress/sql/publication.sql
> > > > > >
> > > > > > 4.
> > > > > > +SET client_min_messages = 'WARNING';
> > > > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > > >
> > > > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > > > particularly now that you've removed the WARNING message that was
> > > > > > previously emitted.
> > > > > >
> > > > > > ~
> > > > > >
> > > > > > 5.
> > > > > > nit - minor comment changes.
> > > > > >
> > > > > > ======
> > > > > > Please refer to the attachment which implements any nits from above.
> > > > > >
> > > > >
> > > > > I have fixed all the given comments. Also, I have created a new 0003
> > > > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > > > The attached patches contain the required changes.
> > > >
> > > > Few comments:
> > > > 1) Since we are no longer throwing an error for generated columns, the
> > > > function header comments also need to be updated accordingly " Checks
> > > > for and raises an ERROR for any; unknown columns, system columns,
> > > > duplicate columns or generated columns."
> > > > - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> > > > - ereport(ERROR,
> > > > -
> > > > errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> > > > - errmsg("cannot use generated
> > > > column \"%s\" in publication column list",
> > > > - colname));
> > > > -
> > > >
> > > > 2) Tab completion missing for "PUBLISH_GENERATED_COLUMNS" option in
> > > > ALTER PUBLICATION ... SET (
> > > > postgres=# alter publication pub2 set (PUBLISH
> > > > PUBLISH PUBLISH_VIA_PARTITION_ROOT
> > > >
> > > > 3) I was able to compile without this include, may be this is not required:
> > > > --- a/src/backend/replication/logical/tablesync.c
> > > > +++ b/src/backend/replication/logical/tablesync.c
> > > > @@ -118,6 +118,7 @@
> > > > #include "utils/builtins.h"
> > > > #include "utils/lsyscache.h"
> > > > #include "utils/memutils.h"
> > > > +#include "utils/rel.h"
> > > >
> > > > 4) You can include "\dRp+ pubname" after each of the create/alter
> > > > publication to verify the columns that will be published:
> > > > +-- Test the 'publish_generated_columns' parameter enabled or disabled for
> > > > +-- different scenarios with/without generated columns in column lists.
> > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) STORED);
> > > > +
> > > > +-- Generated columns in column list, when 'publish_generated_columns'=false
> > > > +CREATE PUBLICATION pub1 FOR table gencols(a, gen1) WITH
> > > > (publish_generated_columns=false);
> > > >
> > > > +-- Generated columns in column list, when 'publish_generated_columns'=true
> > > > +CREATE PUBLICATION pub2 FOR table gencols(a, gen1) WITH
> > > > (publish_generated_columns=true);
> > > > +
> > > > +-- Generated columns in column list, then set
> > > > 'publication_generate_columns'=false
> > > > +ALTER PUBLICATION pub2 SET (publish_generated_columns = false);
> > > > +
> > > > +-- Remove generate columns from column list, when
> > > > 'publish_generated_columns'=false
> > > > +ALTER PUBLICATION pub2 SET TABLE gencols(a);
> > > > +
> > > > +-- Add generated columns in column list, when 'publish_generated_columns'=false
> > > > +ALTER PUBLICATION pub2 SET TABLE gencols(a, gen1);
> > > >
> > >
> > > I have fixed all the given comments. The attached patches contain the
> > > required changes.
> >
> > Few comments:
> > 1) This change is not required:
> > diff --git a/src/backend/catalog/pg_subscription.c
> > b/src/backend/catalog/pg_subscription.c
> > index 9efc9159f2..fcfbf86c0b 100644
> > --- a/src/backend/catalog/pg_subscription.c
> > +++ b/src/backend/catalog/pg_subscription.c
> > @@ -551,3 +551,34 @@ GetSubscriptionRelations(Oid subid, bool not_ready)
> >
> > return res;
> > }
> > +
> > +/*
> > + * Add publication names from the list to a string.
> > + */
> > +void
> > +get_publications_str(List *publications, StringInfo dest, bool quote_literal)
> > +{
> > + ListCell *lc;
> > + bool first = true;
> > +
> > + Assert(publications != NIL);
> > +
> > + foreach(lc, publications)
> > + {
> > + char *pubname = strVal(lfirst(lc));
> > +
> > + if (first)
> > + first = false;
> > + else
> > + appendStringInfoString(dest, ", ");
> > +
> > + if (quote_literal)
> > + appendStringInfoString(dest,
> > quote_literal_cstr(pubname));
> > + else
> > + {
> > + appendStringInfoChar(dest, '"');
> > + appendStringInfoString(dest, pubname);
> > + appendStringInfoChar(dest, '"');
> > + }
> > + }
> > +}
> >
> > It can be moved to subscriptioncmds.c file as earlier.
> >
> > 2) This line change is not required:
> > * Process and validate the 'columns' list and ensure the
> > columns are all
> > - * valid to use for a publication. Checks for and raises
> > an ERROR for
> > - * any; unknown columns, system columns, duplicate
> > columns or generated
> > - * columns.
> > + * valid to use for a publication. Checks for and raises
> > an ERROR for
> >
> > 3) Can we store this information in LogicalRepRelation instead of
> > having a local variable as column information is being stored, that
> > way remotegenlist and remotegenlist_res can be removed and code will
> > be more simpler:
> > + if (server_version >= 180000)
> > + {
> > + remotegenlist[natt] =
> > DatumGetBool(slot_getattr(slot, 5, &isnull));
> > +
> > + /*
> > + * If the column is generated and neither the
> > generated column
> > + * option is specified nor it appears in the
> > column list, we will
> > + * skip it.
> > + */
> > + if (remotegenlist[natt] &&
> > !has_pub_with_pubgencols && !included_cols)
> > + {
> > + ExecClearTuple(slot);
> > + continue;
> > + }
> > + }
> > +
> > rel_colname = TextDatumGetCString(slot_getattr(slot,
> > 2, &isnull));
> > Assert(!isnull);
> >
> > @@ -1015,7 +1112,7 @@ fetch_remote_table_info(char *nspname, char *relname,
> > ExecDropSingleTupleTableSlot(slot);
> >
> > lrel->natts = natt;
> > -
> > + *remotegenlist_res = remotegenlist;
>
> I have fixed all the given comments. The attached v40-0001 patch
> contains the required changes.

Few comments:
1) Add a test case to ensure that an error is properly raised for the
issue reported by Swada-san in [1]:
create table t (a int not null, b int generated always as (a + 1)
stored not null);
create unique index t_idx on t (b);
alter table t replica identity using index t_idx;
insert into t values (1);
update t set a = 100 where a = 1;

2) The existing comments only reference the column list, so we should
revise them to include an important point about generated columns. If
generated columns are set to false, it may result in some columns not
being replicated:
+ if (!isnull)
+ {
+ /* With REPLICA IDENTITY FULL, no column list
is allowed. */
+ if (relation->rd_rel->relreplident ==
REPLICA_IDENTITY_FULL)
+ result = true;
+
+ /* Transform the column list datum to a bitmapset. */
+ columns = pub_collist_to_bitmapset(NULL, datum, NULL);
+ }
+ else
+ {
+ TupleDesc desc = RelationGetDescr(relation);
+ int nliveatts = 0;
+
+ for (int i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ /* Skip if the attribute is dropped or
generated */
+ if (att->attisdropped)
+ continue;
+
+ nliveatts++;
+
+ if (att->attgenerated)
+ continue;
+
+ columns = bms_add_member(columns, i + 1);
+ }

3) Now that we are sending generated columns from the publisher, the
comment in the tuples_equal function is no longer accurate. We need to
update the comment to reflect the new behavior of the function.
Specifically, it should clarify how generated columns are considered
in the equality check:
/*
* Compare the tuples in the slots by checking if they have equal values.
*/
static bool
tuples_equal(TupleTableSlot *slot1, TupleTableSlot *slot2,
TypeCacheEntry **eq)
{
....

/*
* Ignore dropped and generated columns as the publisher doesn't send
* those
*/
if (att->attisdropped || att->attgenerated)
continue;

4) This change is not required:
@@ -1015,7 +1110,6 @@ fetch_remote_table_info(char *nspname, char *relname,
ExecDropSingleTupleTableSlot(slot);

lrel->natts = natt;
-
walrcv_clear_result(res);

[1] - https://www.postgresql.org/message-id/CAD21AoB%3DDBVDNCGBja%2BsDa2-w9tsM7_E%3DZgyw2qYMR1R0FwDsg%40mail.gmail.com

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-10-21 05:36:04 Re: type cache cleanup improvements
Previous Message shveta malik 2024-10-21 05:03:32 Re: Conflict Detection and Resolution