Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-10 05:23:22
Message-ID: CAHut+Ps5X5LHs2Q1AHj=nvEu3Cz8BKG25GL6hqc6LmbgMYtwtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some comments for TAP test patch v37-0003.

I’m not in favour of the removal of such a large number of
'combination' and other 'misc' tests. In the commit message, please
delete me as a "co-author" of this patch.

======

1.
Any description or comment that still mentions "all combinations" is
no longer valid:

(e.g. in the comment message)
Add tests for all combinations of generated column replication.

(e.g. in the test file)
# The following test cases exercise logical replication for all combinations
# where there is a generated column on one or both sides of pub/sub:

and

# Furthermore, all combinations are tested using:

======
2.
+# --------------------------------------------------
+# Testcase: generated -> normal
+# Publisher table has generated column 'b'.
+# Subscriber table has normal column 'b'.
+# --------------------------------------------------
+

Now that COPY for generated columns is already implemented in patch
0001, shouldn't this test be using 'copy_data' enabled, so it can test
replication both for initial tablesync as well as normal replication?

That was the whole point of having the "# XXX copy_data=false for now.
This will be changed later." reminder comment in this file.

======

3.
Previously there were some misc tests to ensure that a generated
column which was then altered using DROP EXPRESSION would work as
expected. The test scenario was commented like:

+# =============================================================================
+# Misc test.
+#
+# A "normal -> generated" replication fails, reporting an error that the
+# subscriber side column is missing.
+#
+# In this test case we use DROP EXPRESSION to change the subscriber generated
+# column into a normal column, then verify replication works ok.
+# =============================================================================

Now in patch v37 this test no longer exists. Why?

======
4.
+# =============================================================================
+# The following test cases demonstrate behavior of generated column replication
+# when publish_generated_colums=false/true:
+#
+# Test: column list includes gencols, when publish_generated_columns=false
+# Test: column list does not include gencols, when
publish_generated_columns=false
+#
+# Test: column list includes gencols, when publish_generated_columns=true
+# Test: column list does not include gencols, when
publish_generated_columns=true
+# Test: no column list, when publish_generated_columns=true
+# =============================================================================

These tests are currently only testing the initial tablesync
replication. Since the COPY logic is different from the normal
replication logic, I think it would be better to test some normal
replication records as well, to make sure both parts work
consistently. This comment applies to all of the following test cases.

~~~

5.
+# Create table and publications.
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE nogen_to_gen3 (a int, b int, gen1 int GENERATED ALWAYS
AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
+ CREATE TABLE nogen_to_gen4 (c int, d int, gen1 int GENERATED ALWAYS
AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2) STORED);
+ INSERT INTO nogen_to_gen3 VALUES (1, 1);
+ INSERT INTO nogen_to_gen4 VALUES (1, 1);
+ CREATE PUBLICATION pub1 FOR table nogen_to_gen3, nogen_to_gen4(gen1)
WITH (publish_generated_columns=true);
+));
+

5a.
The code should do only what the comments say it does. So, the INSERTS
should be done separately after the CREATE PUBLICATION, but before the
CREATE SUBSCRIPTION. A similar change should be made for all of these
test cases.

# Insert some initial data
INSERT INTO nogen_to_gen3 VALUES (1, 1);
INSERT INTO nogen_to_gen4 VALUES (1, 1);

~

5b.
The tables are badly named. Why are they 'nogen_to_gen', when the
publisher side has generated cols and the subscriber side does not?
This problem seems repeated in multiple subsequent test cases.

~

6.
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM gen_to_nogen ORDER BY a");
+is($result, qq(1|1||2),
+ 'gen_to_nogen initial sync, when publish_generated_columns=false');
+
+$result = $node_subscriber->safe_psql('postgres',
+ "SELECT * FROM gen_to_nogen2 ORDER BY c");
+is($result, qq(1|1||),
+ 'gen_to_nogen2 initial sync, when publish_generated_columns=false');

IMO all the "result" queries like these ones ought to have to have a
comment which explains the reason for the expected results. This
review comment applies to multiple places. Please add comments to all
of them.

~~~

7.
+# --------------------------------------------------
+# Testcase: Publisher replicates the column list data excluding generated
+# columns even though publish_generated_columns option is false.
+# --------------------------------------------------
+

7a.
This is the 2nd test case, but AFAICT it would be far easier to test
this scenario just by making another table (with an appropriate column
list) for the 1st test case.

~

7b.
BTW, I don't understand this test at all. I thought according to the
comment that it intended to use a publication column list with only
normal columns in it. But that is not what the publication looks like
here:
+ CREATE PUBLICATION pub1 FOR table nogen_to_gen, nogen_to_gen2(gen1)
WITH (publish_generated_columns=false);

Indeed, the way it is currently written I didn't see what this test is
doing that is any different from the prior test (???)

~~~

8.
+# --------------------------------------------------
+# Testcase: Although publish_generated_columns is true, publisher publishes
+# only the data of the columns specified in column list, skipping other
+# generated/non-generated columns.
+# --------------------------------------------------

versus

+# --------------------------------------------------
+# Testcase: Publisher publishes only the data of the columns specified in
+# column list skipping other generated/non-generated columns.
+# --------------------------------------------------

Again, I did not understand how these test cases differ from each
other. Surely, those can be combined easily enough just by adding
another table with a different kind of column list.

~~~

9.
+# --------------------------------------------------
+# Testcase: Publisher replicates all columns if publish_generated_columns is
+# enabled and there is no column list
+# --------------------------------------------------
+

Here is yet another test case that AFAICT can just be combined with
other test cases that were using publish_generated_columns=true. It
seems all you need is one extra table with no column list. You don't
need all the extra create/drop pub/sub overheads to test this.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-10-10 05:47:25 Re: \watch 0 or \watch 0.00001 doesn't do what I want
Previous Message Alexander Lakhin 2024-10-10 05:00:00 Re: replace strtok()