Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-06-19 12:00:06
Message-ID: CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb+V=p2YHL8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 17, 2024 at 1:57 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are my review comments for patch v7-0001.
>
> ======
> 1. GENERAL - \dRs+
>
> Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
> (e.g. \dRs+ mysub) the same as the other boolean parameters?
>
> ======
> Commit message
>
> 2.
> When 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list
>
> ~
>
> Maybe you don't need to mention "PUBLICATION col-list" twice.
>
> SUGGESTION
> When 'include_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> ~~~
>
> 2.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=9999
> 'publication pub1;
>
> ~
>
> 2a.
> (I've questioned this one in previous reviews)
>
> What exactly is the purpose of this statement in the commit message?
> Was this supposed to demonstrate the usage of the
> 'include_generated_columns' parameter?
>
> ~
>
> 2b.
> /publication/ PUBLICATION/
>
>
> ~~~
>
> 3.
> If the subscriber-side column is also a generated column then
> thisoption has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.
>
> ~
>
> Missing space: /thisoption/this option/
>
> ======
> .../expected/decoding_into_rel.out
>
> 4.
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> + data
> +-------------------------------------------------------------
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> Why is the default value here equivalent to
> 'include-generated-columns' = '1' here instead of '0'? The default for
> the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
> false, and IMO it seems confusing for these 2 defaults to be
> different. Here I think it should default to '0' *regardless* of what
> the previous functionality might have done -- e.g. this is a "test
> decoder" so the parameter should behave sensibly.
>
> ======
> .../test_decoding/sql/decoding_into_rel.sql
>
> NITPICK - wrong comments.
>
> ======
> doc/src/sgml/protocol.sgml
>
> 5.
> + <varlistentry>
> + <term>include_generated_columns</term>
> + <listitem>
> + <para>
> + Boolean option to enable generated columns. This option controls
> + whether generated columns should be included in the string
> + representation of tuples during logical decoding in PostgreSQL.
> + The default is false.
> + </para>
> + </listitem>
> + </varlistentry>
> +
>
> Does the protocol version need to be bumped to support this new option
> and should that be mentioned on this page similar to how all other
> version values are mentioned?

I already did the Backward Compatibility test earlier and decided that
protocol bump is not needed.

> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - some missing words/sentence.
> NITPICK - some missing <literal> tags.
> NITPICK - remove duplicated sentence.
> NITPICK - add another <para>.
>
> ======
> src/backend/commands/subscriptioncmds.c
>
> 6.
> #define SUBOPT_ORIGIN 0x00008000
> +#define SUBOPT_include_generated_columns 0x00010000
>
> Please use UPPERCASE for consistency with other macros.
>
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 7.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 170000)
> + appendStringInfoString(&cmd, ", include_generated_columns 'on'");
> +
>
> IMO it makes more sense to say 'true' here instead of 'on'. It seems
> like this was just cut/paste from the above code (where 'on' was
> sensible).
>
> ======
> src/include/catalog/pg_subscription.h
>
> 8.
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
> * slots) in the upstream database are enabled
> * to be synchronized to the standbys. */
>
> + bool subincludegencol; /* True if generated columns must be published */
> +
>
> Not fixed as claimed. This field name ought to be plural.
>
> /subincludegencol/subincludegencols/
>
> ~~~
>
> 9.
> char *origin; /* Only publish data originating from the
> * specified origin */
> + bool includegencol; /* publish generated column data */
> } Subscription;
>
> Not fixed as claimed. This field name ought to be plural.
>
> /includegencol/includegencols/
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 10.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 10) STORED)"
> +);
> +
> $node_subscriber->safe_psql('postgres',
> "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
> );
>
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
> +);
> +
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 20) STORED)"
> +);
>
> IMO the test needs lots more comments to describe what it is doing:
>
> For example, the setup deliberately has made:
> * publisher-side tab2 has generated col 'b' but subscriber-side tab2
> has NON-gnerated col 'b'.
> * publisher-side tab3 has generated col 'b' but subscriber-side tab2
> has DIFFERENT COMPUTATION generated col 'b'.
>
> So it will be better to have comments to explain all this instead of
> having to figure it out.
>
> ~~~
>
> 11.
> # data for initial sync
>
> $node_publisher->safe_psql('postgres',
> "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO tab2 (a) VALUES (1), (2), (3)");
>
> $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub2 FOR TABLE tab2");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub3 FOR TABLE tab3");
> +
>
> # Wait for initial sync of all subscriptions
> $node_subscriber->wait_for_subscription_sync;
>
> my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
> is( $result, qq(1|22
> 2|44
> 3|66), 'generated columns initial sync');
>
> ~
>
> IMO (and for completeness) it would be better to INSERT data for all
> the tables and alsot to validate that tables tab2 and tab3 has zero
> rows replicated. Yes, I know there is 'copy_data=false', but it is
> just easier to see all the tables instead of guessing why some are
> omitted, and anyway this test case will be needed after the next patch
> implements the COPY support for gen-cols.
>
> ~~~
>
> 12.
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub2');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');
> +
>
> Here also I think there should be explicit comments about what these
> cases are testing, what results you are expecting, and why. The
> comments will look something like the message parameter of those
> safe_psql(...)
>
> e.g.
> # confirm generated columns ARE replicated when the subscriber-side
> column is not generated
>
> e.g.
> # confirm generated columns are NOT replicated when the
> subscriber-side column is also generated
>
> ======
>
> 99.
> Please also see my nitpicks attachment patch for various other
> cosmetic and docs problems, and apply theseif you agree:
> - documentation wording/rendering
> - wrong comments
> - spacing
> - etc.

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-06-19 12:04:15 Re: State of pg_createsubscriber
Previous Message Tomas Vondra 2024-06-19 11:55:35 Re: Parallel CREATE INDEX for GIN indexes