Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-20 11:55:57
Message-ID: CAHv8RjJ5_dmyCH58xQ0StXMdPt9gstemMMWytR79+LfOMAHdLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 18, 2024 at 8:58 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are my review comments for patch v31-0002.
>
> ======
>
> 1. General.
>
> IMO patches 0001 and 0002 should be merged when next posted. IIUC the
> reason for the split was only because there were 2 different authors
> but that seems to be not relevant anymore.
>
> ======
> Commit message
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true, we need to copy using the syntax:
> 'COPY (SELECT column_name FROM table_name) TO STDOUT'.
>
> ~
>
> 2a.
> Should clarify that 'copy_data' is a SUBSCRIPTION parameter.
>
> 2b.
> Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.
>
> ======
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 3.
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
>
> Each time I review this code I am tricked into thinking it is wrong to
> use rel->remoterel.natts here for the localgenlist. AFAICT the code is
> actually fine because you do not store *all* the subscriber gencols in
> 'localgenlist' -- you only store those with matching names on the
> publisher table. It might be good if you could add an explanatory
> comment about that to prevent any future doubts.
>
> ~~~
>
> 4.
> + if (!remotegenlist[remote_attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" has a
> generated column \"%s\" "
> + "but corresponding column on source relation is not a generated column",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname))));
>
> This error message has lots of good information. OTOH, I think when
> copy_data=false the error would report the subscriber column just as
> "missing", which is maybe less helpful. Perhaps that other
> copy_data=false "missing" case can be improved to share the same error
> message that you have here.
>

This comment is still open. Will fix this in the next set of patches.

> ~~~
>
> fetch_remote_table_info:
>
> 5.
> IIUC, this logic needs to be more sophisticated to handle the case
> that was being discussed earlier with Sawada-san [1]. e.g. when the
> same table has gencols but there are multiple subscribed publications
> where the 'publish_generated_columns' parameter differs.
>
> Also, you'll need test cases for this scenario, because it is too
> difficult to judge correctness just by visual inspection of the code.
>
> ~~~~
>
> 6.
> nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
> readability, and initialize it to 'false' to make it easy to use
> later.
>
> ~~~
>
> 7.
> - * Get column lists for each relation.
> + * Get column lists for each relation and check if any of the publication
> + * has generated column option.
>
> and
>
> + /* Check if any of the publication has generated column option */
> + if (server_version >= 180000)
>
> nit - tweak the comments to name the publication parameter properly.
>
> ~~~
>
> 8.
> foreach(lc, MySubscription->publications)
> {
> if (foreach_current_index(lc) > 0)
> appendStringInfoString(&pub_names, ", ");
> appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc))));
> }
>
> I know this is existing code, but shouldn't all this be done by using
> the purpose-built function 'get_publications_str'
>
> ~~~
>
> 9.
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not fetch gencolumns information from publication list: %s",
> + pub_names.data));
>
> and
>
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("failed to fetch tuple for gencols from publication list: %s",
> + pub_names.data));
>
> nit - /gencolumns information/generated column publication
> information/ to make the errmsg more human-readable
>
> ~~~
>
> 10.
> + bool gencols_allowed = server_version >= 180000 && hasgencolpub;
> +
> + if (!gencols_allowed)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
>
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> ======
>
> Please refer to the attachment which implements some of the nits
> mentioned above.
>
> ======
> [1] https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com
>

I have addressed the comments in the v32-0002 Patch. Please refer to
the updated v32-0002 Patch here in [1]. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-09-20 12:00:20 Re: pgsql: Implement pg_wal_replay_wait() stored procedure
Previous Message Shubham Khanna 2024-09-20 11:53:50 Re: Pgoutput not capturing the generated columns