Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-10-28 13:07:26
Message-ID: CAHv8RjLvr8ZxX-1TcaxrZns1nwgrVUTO_2jhDdOPys0WgrDyKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 28, 2024 at 7:43 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi, here are my review comments for patch v43-0001.
>
> ======
>
> 1. Missing docs update?
>
> The CREATE PUBLICATION docs currently says:
> When a column list is specified, only the named columns are
> replicated. If no column list is specified, all columns of the table
> are replicated through this publication, including any columns added
> later.
>
> ~
>
> For this patch, should that be updated to say "... all columns (except
> generated columns) of the table are replicated..."
>
> ======
> src/backend/replication/logical/proto.c
>
> 2.
> +static bool
> +should_publish_column(Form_pg_attribute att, Bitmapset *columns)
> +{
> + if (att->attisdropped)
> + return false;
> +
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (att->attgenerated && !columns)
> + return false;
> +
> + if (!column_in_column_list(att->attnum, columns))
> + return false;
> +
> + return true;
> +}
>
> Here, I wanted to suggest that the whole "Skip publishing generated
> columns" if-part is unnecessary because the next check
> (!column_in_column_list) is going to return false for the same
> scenario anyhow.
>
> But, unfortunately, the "column_in_column_list" function has some
> special NULL handling logic in it; this means none of this code is
> quite what it seems to be (e.g. the function name
> column_in_column_list is somewhat misleading)
>
> IMO it would be better to change the column_in_column_list signature
> -- add another boolean param to say if a NULL column list is allowed
> or not. That will remove any subtle behaviour and then you can remove
> the "if (att->attgenerated && !columns)" part.
>
> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 3. send_relation_and_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
> continue;
>
> if (att->atttypid < FirstGenbkiObjectId)
> continue;
>
> + /*
> + * Skip publishing generated columns if they are not included in the
> + * column list.
> + */
> + if (att->attgenerated && !columns)
> + continue;
> +
> /* Skip this attribute if it's not present in the column list */
> if (columns != NULL && !bms_is_member(att->attnum, columns))
> continue;
> ~
>
> Most of that code above looks to be doing the very same thing as the
> new 'should_publish_column' in proto.c. Won't it be better to expose
> the other function and share the common logic?
>
> ~~~
>
> 4. pgoutput_column_list_init
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
> continue;
>
> + if (att->attgenerated)
> + {
> + if (bms_is_member(att->attnum, cols))
> + gencolpresent = true;
> +
> + continue;
> + }
> +
> nliveatts++;
> }
>
> /*
> - * If column list includes all the columns of the table,
> - * set it to NULL.
> + * If column list includes all the columns of the table
> + * and there are no generated columns, set it to NULL.
> */
> - if (bms_num_members(cols) == nliveatts)
> + if (bms_num_members(cols) == nliveatts && !gencolpresent)
> {
>
> Something seems not quite right (or maybe redundant) with this logic.
> For example, because you unconditionally 'continue' for generated
> columns, then AFAICT it is just not possible for bms_num_members(cols)
> == nliveatts and at the same time 'gencolpresent' to be true. So you
> could've just Asserted (!gencolpresent) instead of checking it in the
> condition and mentioning it in the comment.
>
> ======
> src/test/regress/expected/publication.out
>
> 5.
> --- error: generated column "d" can't be in list
> +-- ok: generated column "d" can be in the list too
> ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> -ERROR: cannot use generated column "d" in publication column list
>
> By allowing the above to work without giving ERROR, I think you've
> broken many subsequent test expected results. e.g. I don't trust these
> "expected" results anymore because I didn't think these next test
> errors should have been affected, right?
>
> -- error: system attributes "ctid" not allowed in column list
> ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, ctid);
> -ERROR: cannot use system column "ctid" in publication column list
> +ERROR: relation "testpub_tbl5" is already member of publication
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> ~
>
> -- error: duplicates not allowed in column list
> ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
> -ERROR: duplicate column "a" in publication column list
> +ERROR: relation "testpub_tbl5" is already member of publication
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> probably more like this...
>
> ======
> src/test/subscription/t/031_column_list.pl
>
> 6.
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE test_gen (a int PRIMARY KEY, b int);
> +));
> +
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE SUBSCRIPTION sub_gen CONNECTION '$publisher_connstr'
> PUBLICATION pub_gen;
> +));
>
> Should combine these.
>
> ~~~
>
> 7.
> +$node_publisher->wait_for_catchup('sub_gen');
> +
> +is( $node_subscriber->safe_psql(
> + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> + qq(1|2),
> + 'replication with generated columns in column list');
> +
>
> But, this is only testing normal replication. You should also include
> some initial table data so you can test that the initial table
> synchronization works too. Otherwise, I think current this patch has
> no proof that the initial 'copy_data' even works at all.
>

All the agreed comments have been addressed. Please find the attached
v44 Patches for the required changes.

Thanks and Regards,
Shubham Khanna.

Attachment Content-Type Size
v44-0001-Support-logical-replication-of-generated-columns.patch application/octet-stream 14.2 KB
v44-0002-Support-copy-generated-column-during-table-sync.patch application/octet-stream 6.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2024-10-28 13:10:51 Re: Pgoutput not capturing the generated columns
Previous Message torikoshia 2024-10-28 13:05:03 Re: RFC: Logging plan of the running query