Re: Pgoutput not capturing the generated columns

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-07-08 05:22:54
Message-ID: CAHut+Pv50FoqfM7f-3SJ441UDzLfG9LqgggLm1Cm0DcGG6VCnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are review comments for v15-0001

======
doc/src/sgml/ddl.sgml

nitpick - there was a comma (,) which should be a period (.)

======
.../libpqwalreceiver/libpqwalreceiver.c

1.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 170000)
+ appendStringInfoString(&cmd, ", include_generated_columns 'true'");
+

Should now say >= 180000

======
src/backend/replication/pgoutput/pgoutput.c

nitpick - comment wording for RelationSyncEntry.collist.

~~

2.
pgoutput_column_list_init:

I found the current logic to be quite confusing. I assume the code is
working OK, because AFAIK there are plenty of tests and they are all
passing, but the logic seems somewhat repetitive and there are also no
comments to explain it adding to my confusion.

IIUC, PRIOR TO THIS PATCH:

BMS field 'columns' represented the "columns of the column list" or it
was NULL if there was no publication column list (and it was also NULL
if the column list contained every column).

IIUC NOW, WITH THIS PATCH:

The BMS field 'columns' meaning is changed slightly to be something
like "columns to be replicated" or NULL if all columns are to be
replicated. This is almost the same thing except we are now handing
the generated columns up-front, so generated columns will or won't
appear in the BMS according to the "include_generated_columns"
parameter. See how this is all a bit subtle which is why copious new
comments are required to explain it...

So, although the test result evidence suggests this is working OK, I
have many questions/issues about it. Here are some to start with:

2a. It needs a lot more (summary and detailed) comments explaining the
logic now that the meaning is slightly different.

2b. What is the story with the FOR ALL TABLES case now? Previously,
there would always be NULL 'columns' for "FOR ALL TABLES" case -- the
comment still says so. But now you've tacked on a 2nd pass of
iterations to build the BMS outside of the "if (!pub->alltables)"
check. Is that OK?

2c. The following logic seemed unexpected:
- if (bms_num_members(cols) == nliveatts)
+ if (bms_num_members(cols) == nliveatts &&
+ data->include_generated_columns)
{
bms_free(cols);
cols = NULL;
`
I had thought the above code would look different -- more like:
if (att->attgenerated && !data->include_generated_columns)
continue;

nliveatts++;
...

2d. Was so much duplicated code necessary? It feels like the whole
"Get the number of live attributes." and assignment of cols to NULL
might be made common to both code paths.

2e. I'm beginning to question the pros/cons of the new BMS logic; I
had suggested trying this way (processing the generated columns
up-front in the BMS 'columns' list) to reduce patch code and simplify
all the subsequent API delegation of "include_generated_cloumns"
everywhere like it was in v14-0001. Indeed, that part was a success
and the patch is now smaller. But I don't like much that we've traded
reduced code overall for increased confusing code in that BMS
function. If all this BMS code can be refactored and commented to be
easier to understand then maybe all will be well, but if it can't then
maybe this BMS change was a bridge too far. I haven't given up on it
just yet, but I wonder what was your opinion about it, and do other
people have thoughts about whether this was the good direction to
take?

======
src/bin/pg_dump/pg_dump.c

3.
+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " s.subincludegencols\n");
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols\n");

Should now say >= 180000

======
src/bin/psql/describe.c

4.
+ /* include_generated_columns is only supported in v18 and higher */
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+ ", subincludegencols AS \"%s\"\n",
+ gettext_noop("Include generated columns"));
+

Should now say >= 180000

======
src/include/catalog/pg_subscription.h

nitpick - let's make the comment the same as in WalRcvStreamOptions

======
src/include/replication/logicalproto.h

nitpick - extern for logicalrep_write_update should be unchanged by this patch

======
src/test/regress/sql/subscription.sql

nitpick = the comment "include_generated_columns and copy_data = true
are mutually exclusive" is not necessary because this all falls under
the existing comment "fail - invalid option combinations"

nitpick - let's explicitly put "copy_data = true" in the CREATE
SUBSCRIPTION to make it more obvious

======
99. Please also refer to the attached 'diffs' patch which implements
all of my nitpicks issues mentioned above.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
PS_NITPICKS_20240705_GENCOLS_V150001.txt text/plain 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-07-08 05:30:23 Re: Pluggable cumulative statistics
Previous Message David Rowley 2024-07-08 05:16:00 Re: SupportRequestRows support function for generate_series_timestamptz