Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-07-09 01:44:10
Message-ID: CAHut+PuboE6yvVV+iihWtVeuuptfFP6ZFAabCTUzhFY_cUpn2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shlok, Here are my review comments for v16-0002

======
src/backend/replication/logical/tablesync.c

1. fetch_remote_table_info

+ if ((server_version >= 120000 && server_version < 180000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");

I felt this condition was a bit complicated. it needs a comment to
explain that "attgenerated" has been supported only since >= PG12 and
'include_generated_columns' is supported only since >= PG18. The more
I look at this I think this is a bug. For example, what happens if the
server is *before* PG12 and include_generated_cols is false; won't it
then try to build SQL using the "attgenerated" column which will cause
an ERROR on the server?

IIRC this condition is already written properly in your patch 0003.
So, most of that 0003 condition refactoring should be done here in
patch 0002 instead.

~~~

2. copy_table

> > So, actually this loop seems to be only finding cases (setting
> > remote_has_gen = true) where the remote column is generated but the
> > match local column is *not* generated. Maybe this was the intended
> > logic all along but then certainly the comment should be improved to
> > describe it better.
>
> 'remotegenlist' is actually constructed in function 'fetch_remote_table_info'
> and it has an entry for every column in the column list specifying
> whether a column is
> generated or not.
> In the function 'make_copy_attnamelist' we are not modifying the list.
> So, I think the current comment would be sufficient. Thoughts?

Yes, I was mistaken thinking the list is "modified". OTOH, I still
feel the existing comment ("Check if remote column list has any
generated column") is misleading because the remote table might have
generated cols but we are not even interested in them if the
equivalent subscriber column is also generated. Please see nitpicks
diff, for my suggestion how to update this comment.

~~~

nitpick - add space after "if"

======
src/test/subscription/t/004_sync.pl

> > 5.
> > Here, you are confirming we get an ERROR when replicating from a
> > non-generated column to a generated column. But I think your patch
> > also added exactly that same test scenario in the 011_generated (as
> > the sub5 test). So, maybe this one here should be removed?
>
> For 0004_sync.pl, it is tested when 'include_generated_columns' is not
> specified. Whereas for the test in 011_generated
> 'include_generated_columns = true' is specified.
> I thought we should have a test for both cases to test if the error
> message format is the same for both cases. Thoughts?

3.
Sorry, I missed that there was a parameter flag difference. Anyway,
since the code-path to reach this error is the same regardless of the
'include_generated_columns' parameter value IMO having too many tests
might be overkill. YMMV.

Anyway, whether you decide to keep both test cases or not, I think all
testing related to generated column replication belongs in the new
001_generated.pl TAP file -- not here in 04_sync.pl
.
======
src/test/subscription/t/011_generated.pl

4. Untested scenarios for "missing col"?

I have seen (in 04_sync.pl) missing column test cases for:
- publisher not-generated col ==> subscriber missing column

Maybe I am mistaken, but I don't recall seeing any test cases for:
- publisher generated-col ==> subscriber missing col

Unless they are already done somewhere, I think this scenario should
be in 011_generated.pl. Furthermore, maybe it needs to be tested for
both include_generated_columns = true / false, because if the
parameter is false it should be OK, but if the parameter is true it
should give ERROR.

~~~

5.
-# publisher-side tab3 has generated col 'b' but subscriber-side tab3
has DIFFERENT COMPUTATION generated col 'b'.
+# tab3:
+# publisher-side tab3 has generated col 'b' but
+# subscriber-side tab3 has DIFFERENT COMPUTATION generated col 'b'.

I think this change is only improving a comment that was introduced by
patch 0001. This all belongs back in patch 0001, then patch 0002 has
nothing to do here.

======
99.
Please also refer to the attached diffs patch which implements any
nitpicks mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_20240709_V160002.txt text/plain 740 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-09 01:45:05 Re: Pluggable cumulative statistics
Previous Message Junwang Zhao 2024-07-09 01:44:00 Re: jsonpath: Inconsistency of timestamp_tz() Output