Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-20 11:45:27
Message-ID: CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 11, 2024 at 8:55 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are a some more review comments for patch v30-0001.
>
> ======
> src/sgml/ref/create_publication.sgml
>
> 1.
> + <para>
> + If the publisher-side column is also a generated column
> then this option
> + has no effect; the publisher column will be filled as normal with the
> + publisher-side computed or default data.
> + </para>
>
> It should say "subscriber-side"; not "publisher-side". The same was
> already reported by Sawada-San [1].
>
> ~~~
>
> 2.
> + <para>
> + This parameter can only be set <literal>true</literal> if
> <literal>copy_data</literal> is
> + set to <literal>false</literal>.
> + </para>
>
> IMO this limitation should be addressed by patch 0001 like it was
> already done in the previous patches (e.g. v22-0002). I think
> Sawada-san suggested the same [1].
>
> Anyway, 'copy_data' is not a PUBLICATION option, so the fact it is
> mentioned like this without any reference to the SUBSCRIPTION seems
> like a cut/paste error from the previous implementation.
>
> ======
> src/backend/catalog/pg_publication.c
>
> 3. pub_collist_validate
> - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> - ereport(ERROR,
> - errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> - errmsg("cannot use generated column \"%s\" in publication column list",
> - colname));
> -
>
> Instead of just removing this ERROR entirely here, I thought it would
> be more user-friendly to give a WARNING if the PUBLICATION's explicit
> column list includes generated cols when the option
> "publish_generated_columns" is false. This combination doesn't seem
> like something a user would do intentionally, so just silently
> ignoring it (like the current patch does) is likely going to give
> someone unexpected results/grief.
>
> ======
> src/backend/replication/logical/proto.c
>
> 4. logicalrep_write_tuple, and logicalrep_write_attrs:
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
> continue;
>
> Why aren't you also checking the new PUBLICATION option here and
> skipping all gencols if the "publish_generated_columns" option is
> false? Or is the BMS of pgoutput_column_list_init handling this case?
> Maybe there should be an Assert for this?
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 5. send_relation_and_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
> continue;
>
> Same question as #4.
>
> ~~~
>
> 6. prepare_all_columns_bms and pgoutput_column_list_init
>
> + if (att->attgenerated && !pub->pubgencolumns)
> + cols = bms_del_member(cols, i + 1);
>
> IIUC, the algorithm seems overly tricky filling the BMS with all
> columns, before straight away conditionally removing the generated
> columns. Can't it be refactored to assign all the correct columns
> up-front, to avoid calling bms_del_member()?
>
> ======
> src/bin/pg_dump/pg_dump.c
>
> 7. getPublications
>
> IIUC, there is lots of missing SQL code here (for all older versions)
> that should be saying "false AS pubgencolumns".
> e.g. compare the SQL with how "false AS pubviaroot" is used.
>
> ======
> src/bin/pg_dump/t/002_pg_dump.pl
>
> 8. Missing tests?
>
> I expected to see a pg_dump test for this new PUBLICATION option.
>
> ======
> src/test/regress/sql/publication.sql
>
> 9. Missing tests?
>
> How about adding another test case that checks this new option must be
> "Boolean"?
>
> ~~~
>
> 10. Missing tests?
>
> --- error: generated column "d" can't be in list
> +-- ok: generated columns can be in the list too
> ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> +ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
>
> (see my earlier comment #3)
>
> IMO there should be another test case for a WARNING here if the user
> attempts to include generated column 'd' in an explicit PUBLICATION
> column list while the "publish_generated-columns" is false.
>
> ======
> [1] https://www.postgresql.org/message-id/CAD21AoA-tdTz0G-vri8KM2TXeFU8RCDsOpBXUBCgwkfokF7%3DjA%40mail.gmail.com
>

I have fixed all the comments. The attached patches contain the desired changes.
Also the merging of 0001 and 0002 can be done once there are no
comments on the patch to help in reviewing.

Thanks and Regards,
Shubham Khanna.

Attachment Content-Type Size
v32-0002-Support-replication-of-generated-column-during-i.patch application/octet-stream 18.4 KB
v32-0001-Enable-support-for-publish_generated_columns-opt.patch application/octet-stream 82.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2024-09-20 11:51:59 Re: Pgoutput not capturing the generated columns
Previous Message Amit Kapila 2024-09-20 10:54:15 Re: Documentation to upgrade logical replication cluster