From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
Cc: | Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com |
Subject: | Re: Pgoutput not capturing the generated columns |
Date: | 2024-06-25 06:25:50 |
Message-ID: | CAHut+PuYOE3C_uv1MAD+k7K7wO0Aj1ffOtRAAeoy4FrhD-2T9g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here are some review comments for the patch v10-0002.
======
Commit Message
1.
Note that we don't copy columns when the subscriber-side column is also
generated. Those will be filled as normal with the subscriber-side computed or
default data.
~
Now this patch also introduced some errors etc, so I think that patch
comment should be written differently to explicitly spell out
behaviour of every combination, something like the below:
Summary
when (include_generated_column = true)
* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).
* publisher not-generated column => subscriber generated column: This
will give ERROR.
* publisher generated column => subscriber not-generated column: The
publisher generated column value is copied.
* publisher generated column => subscriber generated column: The
publisher generated column value is not copied. The subscriber
generated column will be filled with the subscriber-side computed or
default data.
when (include_generated_columns = false)
* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).
* publisher not-generated column => subscriber generated column: This
will give ERROR.
* publisher generated column => subscriber not-generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber column will be filled with the subscriber-side default
data.
* publisher generated column => subscriber generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber generated column will be filled with the
subscriber-side computed or default data.
======
src/backend/replication/logical/relation.c
2.
logicalrep_rel_open:
I tested some of the "missing column" logic, and got the following results:
Scenario A:
PUB
test_pub=# create table t2(a int, b int);
test_pub=# create publication pub2 for table t2;
SUB
test_sub=# create table t2(a int, b int generated always as (a*2) stored);
test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub2 with (include_generated_columns = false);
Result:
ERROR: logical replication target relation "public.t2" is missing
replicated column: "b"
~
Scenario B:
PUB/SUB identical to above, but subscription sub2 created "with
(include_generated_columns = true);"
Result:
ERROR: logical replication target relation "public.t2" has a
generated column "b" but corresponding column on source relation is
not a generated column
~~~
2a. Question
Why should we get 2 different error messages for what is essentially
the same problem according to whether the 'include_generated_columns'
is false or true? Isn't the 2nd error message the more correct and
useful one for scenarios like this involving generated columns?
Thoughts?
~
2b. Missing tests?
I also noticed there seems no TAP test for the current "missing
replicated column" message. IMO there should be a new test introduced
for this because the loop involved too much bms logic to go
untested...
======
src/backend/replication/logical/tablesync.c
make_copy_attnamelist:
NITPICK - minor comment tweak
NITPICK - add some spaces after "if" code
3.
Should you pfree the gencollist at the bottom of this function when
you no longer need it, for tidiness?
~~~
4.
static void
-fetch_remote_table_info(char *nspname, char *relname,
+fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
LogicalRepRelation *lrel, List **qual)
{
WalRcvExecResult *res;
StringInfoData cmd;
TupleTableSlot *slot;
Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
- Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
+ Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
Oid qualRow[] = {TEXTOID};
bool isnull;
+ bool *remotegenlist_res;
IMO the names 'remotegenlist' and 'remotegenlist_res' should be
swapped the other way around, because it is the function parameter
that is the "result", whereas the 'remotegenlist_res' is just the
local working var for it.
~~~
5. fetch_remote_table_info
Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
places, I think it will be better to assign this to a 'server_version'
variable to be used everywhere instead of having multiple function
calls.
~~~
6.
"SELECT a.attnum,"
" a.attname,"
" a.atttypid,"
- " a.attnum = ANY(i.indkey)"
+ " a.attnum = ANY(i.indkey),"
+ " a.attgenerated != ''"
" FROM pg_catalog.pg_attribute a"
" LEFT JOIN pg_catalog.pg_index i"
" ON (i.indexrelid = pg_get_replica_identity_index(%u))"
" WHERE a.attnum > 0::pg_catalog.int2"
- " AND NOT a.attisdropped %s"
+ " AND NOT a.attisdropped", lrel->remoteid);
+
+ if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+
If the server version is < PG12 then AFAIK there was no such thing as
"a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
part also be guarded by some version checking condition like in the
WHERE? Otherwise won't it cause an ERROR for old servers?
~~~
7.
/*
- * For non-tables and tables with row filters, we need to do COPY
- * (SELECT ...), but we can't just do SELECT * because we need to not
- * copy generated columns. For tables with any row filters, build a
- * SELECT query with OR'ed row filters for COPY.
+ * For non-tables and tables with row filters and when
+ * 'include_generated_columns' is specified as 'true', we need to do
+ * COPY (SELECT ...), as normal COPY of generated column is not
+ * supported. For tables with any row filters, build a SELECT query
+ * with OR'ed row filters for COPY.
*/
NITPICK. I felt this was not quite right. AFAIK the reasons for using
this COPY (SELECT ...) syntax is different for row-filters and
generated-columns. Anyway, I updated the comment slightly in my
nitpicks attachment. Please have a look at it to see if you agree with
the suggestions. Maybe I am wrong.
~~~
8.
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(String, att_name, attnamelist)
I'm not 100% sure, but isn't foreach_node the macro to use here,
rather than foreach_ptr?
======
src/test/subscription/t/011_generated.pl
9.
Please discuss with Shubham how to make all the tab1, tab2, tab3,
tab4, tab5, tab6 comments use the same kind of style/wording.
Currently, the patches 0001 and 0002 test comments are a bit
inconsistent.
~~~
10.
Related to above -- now that patch 0002 supports copy_data=true I
don't see why we need to test generated columns *both* for
copy_data=false and also for copy_data=true. IOW, is it really
necessary to have so many tables/tests? For example, I am thinking
some of those tests from patch 0001 can be re-used or just removed now
that copy_data=true works.
~~~
NITPICK - minor comment tweak
~~~
11.
For tab4 and tab6 I saw the initial sync and normal replication data
tests are all merged together, but I had expected to see the initial
sync and normal replication data tests separated so it would be
consistent with the earlier tab1, tab2, tab3 tests.
======
99.
Also, I have attached a nitpicks diff for some of the cosmetic review
comments mentioned above. Please apply whatever of these that you
agree with.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
Attachment | Content-Type | Size |
---|---|---|
PS_NITPICKS_20240625_V100002.txt | text/plain | 2.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii.Yuki@df.MitsubishiElectric.co.jp | 2024-06-25 06:33:07 | RE: Partial aggregates pushdown |
Previous Message | Amit Kapila | 2024-06-25 06:24:54 | Re: speed up a logical replica setup |