Re: Pgoutput not capturing the generated columns

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-13 11:34:06
Message-ID: CAHv8Rj+inrG6EU0rpDJxih8mmYLhCUP6ouTAmMN2RDnT9tE_Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
> <khannashubham1197(at)gmail(dot)com> wrote:
> >
> > On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > > generated columns because we don't know the target table on the
> > > > > > subscriber has the same expression and there could be locale issues
> > > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > > would be to save cost to compute generated column values if the user
> > > > > > wants the target table on the subscriber to have exactly the same data
> > > > > > as the publisher's one. Are there other benefits or use cases?
> > > > > >
> > > > >
> > > > > The cost is one but the other is the user may not want the data to be
> > > > > different based on volatile functions like timeofday()
> > > >
> > > > Shouldn't the generation expression be immutable?
> > > >
> > >
> > > Yes, I missed that point.
> > >
> > > > > or the table on
> > > > > subscriber won't have the column marked as generated.
> > > >
> > > > Yeah, it would be another use case.
> > > >
> > >
> > > Right, apart from that I am not aware of other use cases. If they
> > > have, I would request Euler or Rajendra to share any other use case.
> > >
> > > > > Now, considering
> > > > > such use cases, is providing a subscription-level option a good idea
> > > > > as the patch is doing? I understand that this can serve the purpose
> > > > > but it could also lead to having the same behavior for all the tables
> > > > > in all the publications for a subscription which may or may not be
> > > > > what the user expects. This could lead to some performance overhead
> > > > > (due to always sending generated columns for all the tables) for cases
> > > > > where the user needs it only for a subset of tables.
> > > >
> > > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > > users want to send both tables with generated columns and tables
> > > > without generated columns, they would have to create at least two
> > > > subscriptions.
> > > >
> > >
> > > Agreed and that would consume more resources.
> > >
> > > > Also, they would have to include a different set of
> > > > tables to two publications.
> > > >
> > > > >
> > > > > I think we should consider it as a table-level option while defining
> > > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > > explicitly mention the generated column in the columns list while
> > > > > defining publication. This has a drawback such that users need to
> > > > > specify the column list even when all columns need to be replicated.
> > > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > > could be some challenges but we can at least investigate it.
> > > >
> > > > I think we can create a publication for a single table, so what we can
> > > > do with this feature can be done also by the idea you described below.
> > > >
> > > > > Yet another idea is to keep this as a publication option
> > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > is used when tables on either side have different partitions
> > > > > hierarchies which is somewhat the case here.
> > > >
> > > > It sounds more useful to me.
> > > >
> > >
> > > Fair enough. Let's see if anyone else has any preference among the
> > > proposed methods or can think of a better way.
> >
> > I have fixed the current issue. I have added the option
> > 'publish_generated_columns' to the publisher side and created the new
> > test cases accordingly.
> > The attached patches contain the desired changes.
> >
>
> Thank you for updating the patches. I have some comments:
>
> Do we really need to add this option to test_decoding? I think it
> would be good if this improves the test coverage. Otherwise, I'm not
> sure we need this part. If we want to add it, I think it would be
> better to have it in a separate patch.
>

I have removed the option from the test_decoding file.

> ---
> + <para>
> + If the publisher-side column is also a generated column
> then this option
> + has no effect; the publisher column will be filled as normal with the
> + publisher-side computed or default data.
> + </para>
>
> I don't understand this description. Why does this option have no
> effect if the publisher-side column is a generated column?
>

The documentation was incorrect. Currently, replicating from a
publisher table with a generated column to a subscriber table with a
generated column will result in an error. This has now been updated.

> ---
> + <para>
> + This parameter can only be set <literal>true</literal> if
> <literal>copy_data</literal> is
> + set to <literal>false</literal>.
> + </para>
>
> If I understand this patch correctly, it doesn't disallow to set
> copy_data to true when the publish_generated_columns option is
> specified. But do we want to disallow it? I think it would be more
> useful and understandable if we allow to use both
> publish_generated_columns (publisher option) and copy_data (subscriber
> option) at the same time.
>

Support for tablesync with generated columns was not included in the
initial patch, and this was reflected in the documentation. The
functionality for syncing generated column data has been introduced
with the 0002 patch.

The attached v31 patches contain the changes for the same. I won't be
posting the test patch for now. I will share it once this patch has
been stabilized.

Thanks and Regards,
Shubham Khanna.

Attachment Content-Type Size
v31-0001-Enable-support-for-publish_generated_columns-opt.patch application/octet-stream 77.9 KB
v31-0002-Support-replication-of-generated-column-during-i.patch application/octet-stream 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-13 11:35:35 Re: Why don't we consider explicit Incremental Sort?
Previous Message Jim Jones 2024-09-13 11:27:33 Re: Psql meta-command conninfo+