Re: Pgoutput not capturing the generated columns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-08-29 03:13:31
Message-ID: CAD21AoBkkuMHhr-SOHvGprPmDDL4thpGcDTofDZ6apzaU7H_BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> > <khannashubham1197(at)gmail(dot)com> wrote:
> > >
> > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > > <dangwalrajendra888(at)gmail(dot)com> wrote:
> > > >
> > > > Hi PG Hackers.
> > > >
> > > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking such feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated.
> > >
> > > The attached patch has the changes to support capturing generated
> > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > > ‘include_generated_columns’ option is specified, the generated column
> > > information and generated column data also will be sent.
> >
> > As Euler mentioned earlier, I think it's a decision not to replicate
> > generated columns because we don't know the target table on the
> > subscriber has the same expression and there could be locale issues
> > even if it looks the same. I can see that a benefit of this proposal
> > would be to save cost to compute generated column values if the user
> > wants the target table on the subscriber to have exactly the same data
> > as the publisher's one. Are there other benefits or use cases?
> >
>
> The cost is one but the other is the user may not want the data to be
> different based on volatile functions like timeofday()

Shouldn't the generation expression be immutable?

> or the table on
> subscriber won't have the column marked as generated.

Yeah, it would be another use case.

> Now, considering
> such use cases, is providing a subscription-level option a good idea
> as the patch is doing? I understand that this can serve the purpose
> but it could also lead to having the same behavior for all the tables
> in all the publications for a subscription which may or may not be
> what the user expects. This could lead to some performance overhead
> (due to always sending generated columns for all the tables) for cases
> where the user needs it only for a subset of tables.

Yeah, it's a downside and I think it's less flexible. For example, if
users want to send both tables with generated columns and tables
without generated columns, they would have to create at least two
subscriptions. Also, they would have to include a different set of
tables to two publications.

>
> I think we should consider it as a table-level option while defining
> publication in some way. A few ideas could be: (a) We ask users to
> explicitly mention the generated column in the columns list while
> defining publication. This has a drawback such that users need to
> specify the column list even when all columns need to be replicated.
> (b) We can have some new syntax to indicate the same like: CREATE
> PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> could be some challenges but we can at least investigate it.

I think we can create a publication for a single table, so what we can
do with this feature can be done also by the idea you described below.

> Yet another idea is to keep this as a publication option
> (include_generated_columns or publish_generated_columns) similar to
> "publish_via_partition_root". Normally, "publish_via_partition_root"
> is used when tables on either side have different partition
> hierarchies which is somewhat the case here.

It sounds more useful to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-08-29 03:17:30 Re: Collect statistics about conflicts in logical replication
Previous Message Tom Lane 2024-08-29 03:03:56 Re: Remove unnecessary check on set-returning functions in values_lists