Re: Pgoutput not capturing the generated columns

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-07-04 10:28:35
Message-ID: CANhcyEW95M_usF1OJDudeejs0L0+YOEb=dXmt_4Hs-70=CXa-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 25 Jun 2024 at 11:56, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for the patch v10-0002.
>
> ======
> Commit Message
>
> 1.
> Note that we don't copy columns when the subscriber-side column is also
> generated. Those will be filled as normal with the subscriber-side computed or
> default data.
>
> ~
>
> Now this patch also introduced some errors etc, so I think that patch
> comment should be written differently to explicitly spell out
> behaviour of every combination, something like the below:
>
> Summary
>
> when (include_generated_column = true)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: The
> publisher generated column value is copied.
>
> * publisher generated column => subscriber generated column: The
> publisher generated column value is not copied. The subscriber
> generated column will be filled with the subscriber-side computed or
> default data.
>
> when (include_generated_columns = false)
>
> * publisher not-generated column => subscriber not-generated column:
> This is just normal logical replication (not changed by this patch).
>
> * publisher not-generated column => subscriber generated column: This
> will give ERROR.
>
> * publisher generated column => subscriber not-generated column: This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber column will be filled with the subscriber-side default
> data.
>
> * publisher generated column => subscriber generated column: This
> will replicate nothing. Publisher generate-column is not replicated.
> The subscriber generated column will be filled with the
> subscriber-side computed or default data.
Modified

> ======
> src/backend/replication/logical/relation.c
>
> 2.
> logicalrep_rel_open:
>
> I tested some of the "missing column" logic, and got the following results:
>
> Scenario A:
> PUB
> test_pub=# create table t2(a int, b int);
> test_pub=# create publication pub2 for table t2;
> SUB
> test_sub=# create table t2(a int, b int generated always as (a*2) stored);
> test_sub=# create subscription sub2 connection 'dbname=test_pub'
> publication pub2 with (include_generated_columns = false);
> Result:
> ERROR: logical replication target relation "public.t2" is missing
> replicated column: "b"
>
> ~
>
> Scenario B:
> PUB/SUB identical to above, but subscription sub2 created "with
> (include_generated_columns = true);"
> Result:
> ERROR: logical replication target relation "public.t2" has a
> generated column "b" but corresponding column on source relation is
> not a generated column
>
> ~~~
>
> 2a. Question
>
> Why should we get 2 different error messages for what is essentially
> the same problem according to whether the 'include_generated_columns'
> is false or true? Isn't the 2nd error message the more correct and
> useful one for scenarios like this involving generated columns?
>
> Thoughts?
Did the modification to give same error in both cases

> ~
>
> 2b. Missing tests?
>
> I also noticed there seems no TAP test for the current "missing
> replicated column" message. IMO there should be a new test introduced
> for this because the loop involved too much bms logic to go
> untested...
Added the tests 004_sync.pl

> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> NITPICK - minor comment tweak
> NITPICK - add some spaces after "if" code
Applied the changes

> 3.
> Should you pfree the gencollist at the bottom of this function when
> you no longer need it, for tidiness?
Fixed

> ~~~
>
> 4.
> static void
> -fetch_remote_table_info(char *nspname, char *relname,
> +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
> LogicalRepRelation *lrel, List **qual)
> {
> WalRcvExecResult *res;
> StringInfoData cmd;
> TupleTableSlot *slot;
> Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
> - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
> + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
> Oid qualRow[] = {TEXTOID};
> bool isnull;
> + bool *remotegenlist_res;
>
> IMO the names 'remotegenlist' and 'remotegenlist_res' should be
> swapped the other way around, because it is the function parameter
> that is the "result", whereas the 'remotegenlist_res' is just the
> local working var for it.
Fixed

> ~~~
>
> 5. fetch_remote_table_info
>
> Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
> places, I think it will be better to assign this to a 'server_version'
> variable to be used everywhere instead of having multiple function
> calls.
Fixed

> ~~~
>
> 6.
> "SELECT a.attnum,"
> " a.attname,"
> " a.atttypid,"
> - " a.attnum = ANY(i.indkey)"
> + " a.attnum = ANY(i.indkey),"
> + " a.attgenerated != ''"
> " FROM pg_catalog.pg_attribute a"
> " LEFT JOIN pg_catalog.pg_index i"
> " ON (i.indexrelid = pg_get_replica_identity_index(%u))"
> " WHERE a.attnum > 0::pg_catalog.int2"
> - " AND NOT a.attisdropped %s"
> + " AND NOT a.attisdropped", lrel->remoteid);
> +
> + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
> + walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
> + !MySubscription->includegencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> +
>
> If the server version is < PG12 then AFAIK there was no such thing as
> "a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
> part also be guarded by some version checking condition like in the
> WHERE? Otherwise won't it cause an ERROR for old servers?
Fixed

> ~~~
>
> 7.
> /*
> - * For non-tables and tables with row filters, we need to do COPY
> - * (SELECT ...), but we can't just do SELECT * because we need to not
> - * copy generated columns. For tables with any row filters, build a
> - * SELECT query with OR'ed row filters for COPY.
> + * For non-tables and tables with row filters and when
> + * 'include_generated_columns' is specified as 'true', we need to do
> + * COPY (SELECT ...), as normal COPY of generated column is not
> + * supported. For tables with any row filters, build a SELECT query
> + * with OR'ed row filters for COPY.
> */
>
> NITPICK. I felt this was not quite right. AFAIK the reasons for using
> this COPY (SELECT ...) syntax is different for row-filters and
> generated-columns. Anyway, I updated the comment slightly in my
> nitpicks attachment. Please have a look at it to see if you agree with
> the suggestions. Maybe I am wrong.
Fixed

> ~~~
>
> 8.
> - for (int i = 0; i < lrel.natts; i++)
> + foreach_ptr(String, att_name, attnamelist)
>
> I'm not 100% sure, but isn't foreach_node the macro to use here,
> rather than foreach_ptr?
Fixed

> ======
> src/test/subscription/t/011_generated.pl
>
> 9.
> Please discuss with Shubham how to make all the tab1, tab2, tab3,
> tab4, tab5, tab6 comments use the same kind of style/wording.
> Currently, the patches 0001 and 0002 test comments are a bit
> inconsistent.
Fixed

> ~~~
>
> 10.
> Related to above -- now that patch 0002 supports copy_data=true I
> don't see why we need to test generated columns *both* for
> copy_data=false and also for copy_data=true. IOW, is it really
> necessary to have so many tables/tests? For example, I am thinking
> some of those tests from patch 0001 can be re-used or just removed now
> that copy_data=true works.
Fixed

> ~~~
>
> NITPICK - minor comment tweak
Fixed

> ~~~
>
> 11.
> For tab4 and tab6 I saw the initial sync and normal replication data
> tests are all merged together, but I had expected to see the initial
> sync and normal replication data tests separated so it would be
> consistent with the earlier tab1, tab2, tab3 tests.
Fixed

> ======
>
> 99.
> Also, I have attached a nitpicks diff for some of the cosmetic review
> comments mentioned above. Please apply whatever of these that you
> agree with.
Applied the relevant changes

I have attached a v14 to fix the comments.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v14-0002-Support-replication-of-generated-column-during-i.patch application/octet-stream 25.8 KB
v14-0001-Enable-support-for-include_generated_columns-opt.patch application/octet-stream 88.9 KB
v14-0003-Fix-behaviour-for-Virtual-Generated-columns.patch application/octet-stream 12.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2024-07-04 10:31:31 Re: Pgoutput not capturing the generated columns
Previous Message Aleksander Alekseev 2024-07-04 10:27:51 Re: Problem while installing PostgreSQL using make