Re: Pgoutput not capturing the generated columns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-08-29 06:16:35
Message-ID: CAA4eK1+O6Uk0teoKd806=k9eWp_ZxxUoC02r-zCbLSe+DuO3Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > generated columns because we don't know the target table on the
> > > subscriber has the same expression and there could be locale issues
> > > even if it looks the same. I can see that a benefit of this proposal
> > > would be to save cost to compute generated column values if the user
> > > wants the target table on the subscriber to have exactly the same data
> > > as the publisher's one. Are there other benefits or use cases?
> > >
> >
> > The cost is one but the other is the user may not want the data to be
> > different based on volatile functions like timeofday()
>
> Shouldn't the generation expression be immutable?
>

Yes, I missed that point.

> > or the table on
> > subscriber won't have the column marked as generated.
>
> Yeah, it would be another use case.
>

Right, apart from that I am not aware of other use cases. If they
have, I would request Euler or Rajendra to share any other use case.

> > Now, considering
> > such use cases, is providing a subscription-level option a good idea
> > as the patch is doing? I understand that this can serve the purpose
> > but it could also lead to having the same behavior for all the tables
> > in all the publications for a subscription which may or may not be
> > what the user expects. This could lead to some performance overhead
> > (due to always sending generated columns for all the tables) for cases
> > where the user needs it only for a subset of tables.
>
> Yeah, it's a downside and I think it's less flexible. For example, if
> users want to send both tables with generated columns and tables
> without generated columns, they would have to create at least two
> subscriptions.
>

Agreed and that would consume more resources.

> Also, they would have to include a different set of
> tables to two publications.
>
> >
> > I think we should consider it as a table-level option while defining
> > publication in some way. A few ideas could be: (a) We ask users to
> > explicitly mention the generated column in the columns list while
> > defining publication. This has a drawback such that users need to
> > specify the column list even when all columns need to be replicated.
> > (b) We can have some new syntax to indicate the same like: CREATE
> > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > could be some challenges but we can at least investigate it.
>
> I think we can create a publication for a single table, so what we can
> do with this feature can be done also by the idea you described below.
>
> > Yet another idea is to keep this as a publication option
> > (include_generated_columns or publish_generated_columns) similar to
> > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > is used when tables on either side have different partition
> > hierarchies which is somewhat the case here.
>
> It sounds more useful to me.
>

Fair enough. Let's see if anyone else has any preference among the
proposed methods or can think of a better way.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-29 06:28:30 Re: macOS prefetching support
Previous Message Bharath Rupireddy 2024-08-29 06:01:09 Re: Introduce XID age and inactive timeout based replication slot invalidation