Re: Pgoutput not capturing the generated columns

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-06-20 13:00:19
Message-ID: CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S+jRzzzXo2RavNWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 18 Jun 2024 at 10:57, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are my review comments for patch v7-0002
>
> ======
> Commit Message
>
> NITPICKS
> - rearrange paragraphs
> - typo "donot"
> - don't start a sentence with "And"
> - etc.
>
> Please see the attachment for my suggested commit message text updates
> and take from it whatever you agree with.
>
Fixed

> ======
> doc/src/sgml/ref/create_subscription.sgml
>
> 1.
> + If the subscriber-side column is also a generated column
> then this option
> + has no effect; the replicated data will be ignored and the subscriber
> + column will be filled as normal with the subscriber-side computed or
> + default data. And during table synchronization, the data
> corresponding to
> + the generated column on subscriber-side will not be sent from the
> + publisher to the subscriber.
>
> This text already mentions subscriber-side generated cols. IMO you
> don't need to say anything at all about table synchronization --
> that's just an internal code optimization, which is not something the
> user needs to know about. IOW, the entire last sentence ("And
> during...") should be removed.
>
Fixed

> ======
> src/backend/replication/logical/relation.c
>
> 2. logicalrep_rel_open
>
> - if (attr->attisdropped)
> + if (attr->attisdropped ||
> + (!MySubscription->includegencol && attr->attgenerated))
> {
> entry->attrmap->attnums[i] = -1;
> continue;
>
> ~
>
> Maybe I'm mistaken, but isn't this code for skipping checking for
> "missing" subscriber-side (aka local) columns? Can't it just
> unconditionally skip every attr->attgenerated -- i.e. why does it
> matter if the MySubscription->includegencol was set or not?
>
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns.
So, in this case if we skip every attr->attgenerated, we will get a
missing column error.

> ======
> src/backend/replication/logical/tablesync.c
>
> 3. make_copy_attnamelist
>
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> +
> + for (i = 0; i < desc->natts; i++)
> {
> - attnamelist = lappend(attnamelist,
> - makeString(rel->remoterel.attnames[i]));
> + int attnum;
> + Form_pg_attribute attr = TupleDescAttr(desc, i);
> +
> + if (!attr->attgenerated)
> + continue;
> +
> + attnum = logicalrep_rel_att_by_name(&rel->remoterel,
> + NameStr(attr->attname));
> +
> + /*
> + * Check if subscription table have a generated column with same
> + * column name as a non-generated column in the corresponding
> + * publication table.
> + */
> + if (attnum >=0 && !attgenlist[attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" is missing
> replicated column: \"%s\"",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
> +
> + if (attnum >= 0)
> + gencollist = lappend_int(gencollist, attnum);
> }
>
> ~
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Typo in comment
> NITPICK - spaces
>
> ~
>
> 3a.
> I think above code should be refactored so there is only one check for
> "if (attnum >= 0)" -- e.g. other condition should be nested.
>
> ~
>
> 3b.
> That ERROR message says "missing replicated column", but that doesn't
> seem much like what the code-comment was saying this code is about.
>
Fixed

> ~~~
>
> 4.
> + for (i = 0; i < rel->remoterel.natts; i++)
> + {
> +
> + if (gencollist != NIL && j < gencollist->length &&
> + list_nth_int(gencollist, j) == i)
> + j++;
> + else
> + attnamelist = lappend(attnamelist,
> + makeString(rel->remoterel.attnames[i]));
> + }
>
> NITPICK - Use C99-style for loop variables
> NITPICK - Unnecessary blank lines
>
> ~
>
> IIUC the subscriber-side table and the publisher-side table do NOT
> have to have all the columns in identical order for the logical
> replication to work correcly. AFAIK it works fine so long as the
> column names match for the replicated columns. Therefore, I am
> suspicious that this new patch code seems to be imposing some new
> ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
> not current requirements.
>
> ~~~
>
> copy_table:
>
> NITPICK - comment typo
> NITPICK - comment wording
>
Fixed

> ~
>
> 5.
> + int i = 0;
> + ListCell *l;
> +
> appendStringInfoString(&cmd, "COPY (SELECT ");
> - for (int i = 0; i < lrel.natts; i++)
> + foreach(l, attnamelist)
> {
> - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> - if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, quote_identifier(strVal(lfirst(l))));
> + if (i < attnamelist->length - 1)
> appendStringInfoString(&cmd, ", ");
> + i++;
> }
> IIUC for new code like this, it is preferred to use the foreach*
> macros instead of ListCell.
>
Fixed

> ======
> src/test/regress/sql/subscription.sql
>
> 6.
> --- fail - copy_data and include_generated_columns are mutually
> exclusive options
> -CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
> PUBLICATION testpub WITH (include_generated_columns = true);
> -ERROR: copy_data = true and include_generated_columns = true are
> mutually exclusive options
>
> It is OK to delete this test now but IMO still needs to be some
> "include_generated_columns must be boolean" test cases (e.g. same as
> there was two_phase). Actually, this should probably be done by the
> 0001 patch.
>
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 7.
> All the PRIMARY KEY stuff may be overkill. Are primary keys really
> needed for these tests?
>
Fixed

> ~~~
>
> 8.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab4 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED, c int GENERATED ALWAYS AS (a * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab5 (a int PRIMARY KEY, b int)"
> +);
> +
>
> Maybe add comments on what is special about all these tables, so don't
> have to read the tests later to deduce their purpose.
>
> tab4: publisher-side generated col 'b' and 'c' ==> subscriber-side
> non-generated col 'b', and generated-col 'c'
> tab5: publisher-side non-generated col 'b' --> subscriber-side
> non-generated col 'b'
>
Fixed

> ~~~
>
> 9.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE SUBSCRIPTION sub4 CONNECTION '$publisher_connstr'
> PUBLICATION pub4 WITH (include_generated_columns = true)"
> + );
> +
>
> All the publications are created together, and all the subscriptions
> are created together except for 'sub5'. Consider including a comment
> to say why you deliberately created the 'sub5' subscription separate
> from all others.
>
Fixed

> ======
>
> 99.
> Please also see my code nitpicks attachment patch for various other
> cosmetic problems, and apply them if you agree.
>
Applied the changes

I have fixed the comments and attached the patches. I have also
attached the v9-0003 patch. It will resolve the issue suggested by
Vignesh in [1]. I have also updated the documentation for the same.
v9-0001 - Not Modified
v9-0002 - Support replication of generated columns during initial sync.
v9-0003 - Fix behaviour of tablesync for Virtual Generated Columns.

[1]: https://www.postgresql.org/message-id/CALDaNm3Ufg872XqgPvBVzXHvUVenu-8%2BGz2dyEuKq3CN0UxfKw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v9-0001-Currently-generated-column-values-are-not-replica.patch application/octet-stream 84.7 KB
v9-0002-Support-replication-of-generated-column-during-in.patch application/octet-stream 20.6 KB
v9-0003-Fix-tablesync-behaviour-for-Virtual-Generated-col.patch application/octet-stream 2.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-06-20 13:03:15 Re: Pgoutput not capturing the generated columns
Previous Message vignesh C 2024-06-20 12:53:34 Re: Logical Replication of sequences