Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-16 18:01:57
Message-ID: CAHv8RjLQAUvZKZykFttUzBg0bW3pjieRGfV88-WZThw70WY-YQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 9, 2024 at 11:13 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are my review comments for patch v37-0001.
>
> ======
> Commit message
>
> 1.
> Example usage of subscription option:
> CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns
> = true);
>
> ~
>
> This is wrong -- it's not a "subscription option". Better to just say
> "Example usage:"
>
> ~~~
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true...
>
> ~
>
> By only mentioning the "when ... is true" case this description does
> not cover the scenario when 'publish_generated_columns' is false when
> the publication column list has a generated column.
>
> ~~~
>
> 3.
> typo - /replication of generated column/replication of generated columns/
> typo - /filed/filled/
> typo - 'pg_publicataion' catalog
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> 4.
> nit - missing word in a comment
>
> ~~~
>
> fetch_remote_table_info:
> 5.
> + appendStringInfo(&cmd,
> " FROM pg_catalog.pg_attribute a"
> " LEFT JOIN pg_catalog.pg_index i"
> " ON (i.indexrelid = pg_get_replica_identity_index(%u))"
> " WHERE a.attnum > 0::pg_catalog.int2"
> - " AND NOT a.attisdropped %s"
> + " AND NOT a.attisdropped", lrel->remoteid);
> +
> + appendStringInfo(&cmd,
> " AND a.attrelid = %u"
> " ORDER BY a.attnum",
> - lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 ?
> - "AND a.attgenerated = ''" : ""),
> lrel->remoteid);
>
> Version v37-0001 has removed a condition previously between these two
> appendStringInfo's. But, that now means there is no reason to keep
> these statements separated. These should be combined now to use one
> appendStringInfo.
>
> ~
>
> 6.
> + if (server_version >= 120000)
> + remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
>
> Are you sure the version check for 120000 is correct? IIUC, this 5
> matches the 'attgenerated' column, but the SQL for that was
> constructed using a different condition:
> if (server_version >= 180000)
> appendStringInfo(&cmd, ", a.attgenerated != ''");
>
> It is this 120000 versus 180000 difference that makes me suspicious of
> a potential mistake.
>
> ~~~
>
> 7.
> + /*
> + * If the column is generated and neither the generated column option
> + * is specified nor it appears in the column list, we will skip it.
> + */
> + if (remotegenlist[natt] && !has_pub_with_pubgencols &&
> + !bms_is_member(attnum, included_cols))
> + {
> + ExecClearTuple(slot);
> + continue;
> + }
>
> 7b.
> I am also suspicious about how this condition interacts with the other
> condition (shown below) that came earlier:
> /* If the column is not in the column list, skip it. */
> if (included_cols != NULL && !bms_is_member(attnum, included_cols))
>
> Something doesn't seem right. e.g. If we can only get here by passing
> the earlier condition, then it means we already know the generated
> condition was *not* a member of a column list.... in which case that
> should affect this new condition and the new comment too.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_column_list_init:
>
> 8.
> /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * Process potential column lists for the following cases: a. Any
> + * publication that is not FOR ALL TABLES. b. When the publication is
> + * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL
> + * TABLES publication doesn't have user-defined column lists, so all
> + * columns will be replicated by default. However, if
> + * 'publish_generated_columns' is set to false, column lists must
> + * still be created to exclude any generated columns from being
> + * published.
> */
>
> nit - please reformat this comment so the bullets are readable
>

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-10-16 18:02:42 Re: Avoiding superfluous buffer locking during nbtree backwards scans
Previous Message Shubham Khanna 2024-10-16 17:59:25 Re: Pgoutput not capturing the generated columns