From: | Shubham Khanna <khannashubham1197(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com |
Subject: | Re: Pgoutput not capturing the generated columns |
Date: | 2024-09-09 09:38:19 |
Message-ID: | CAHv8RjJ8-pgubrG1v7i8C8Hc1i6WWvGKW4Sq-GP73jsBDXkn8A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > generated columns because we don't know the target table on the
> > > > subscriber has the same expression and there could be locale issues
> > > > even if it looks the same. I can see that a benefit of this proposal
> > > > would be to save cost to compute generated column values if the user
> > > > wants the target table on the subscriber to have exactly the same data
> > > > as the publisher's one. Are there other benefits or use cases?
> > > >
> > >
> > > The cost is one but the other is the user may not want the data to be
> > > different based on volatile functions like timeofday()
> >
> > Shouldn't the generation expression be immutable?
> >
>
> Yes, I missed that point.
>
> > > or the table on
> > > subscriber won't have the column marked as generated.
> >
> > Yeah, it would be another use case.
> >
>
> Right, apart from that I am not aware of other use cases. If they
> have, I would request Euler or Rajendra to share any other use case.
>
> > > Now, considering
> > > such use cases, is providing a subscription-level option a good idea
> > > as the patch is doing? I understand that this can serve the purpose
> > > but it could also lead to having the same behavior for all the tables
> > > in all the publications for a subscription which may or may not be
> > > what the user expects. This could lead to some performance overhead
> > > (due to always sending generated columns for all the tables) for cases
> > > where the user needs it only for a subset of tables.
> >
> > Yeah, it's a downside and I think it's less flexible. For example, if
> > users want to send both tables with generated columns and tables
> > without generated columns, they would have to create at least two
> > subscriptions.
> >
>
> Agreed and that would consume more resources.
>
> > Also, they would have to include a different set of
> > tables to two publications.
> >
> > >
> > > I think we should consider it as a table-level option while defining
> > > publication in some way. A few ideas could be: (a) We ask users to
> > > explicitly mention the generated column in the columns list while
> > > defining publication. This has a drawback such that users need to
> > > specify the column list even when all columns need to be replicated.
> > > (b) We can have some new syntax to indicate the same like: CREATE
> > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > could be some challenges but we can at least investigate it.
> >
> > I think we can create a publication for a single table, so what we can
> > do with this feature can be done also by the idea you described below.
> >
> > > Yet another idea is to keep this as a publication option
> > > (include_generated_columns or publish_generated_columns) similar to
> > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > is used when tables on either side have different partitions
> > > hierarchies which is somewhat the case here.
> >
> > It sounds more useful to me.
> >
>
> Fair enough. Let's see if anyone else has any preference among the
> proposed methods or can think of a better way.
I have fixed the current issue. I have added the option
'publish_generated_columns' to the publisher side and created the new
test cases accordingly.
The attached patches contain the desired changes.
Thanks and Regards,
Shubham Khanna.
Attachment | Content-Type | Size |
---|---|---|
v30-0002-Tap-tests-for-generated-columns.patch | application/octet-stream | 18.7 KB |
v30-0001-Enable-support-for-publish_generated_columns-opt.patch | application/octet-stream | 88.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2024-09-09 09:39:24 | Why don't we consider explicit Incremental Sort? |
Previous Message | Amit Kapila | 2024-09-09 09:34:44 | Re: Introduce XID age and inactive timeout based replication slot invalidation |