Re: Pgoutput not capturing the generated columns

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-08-28 06:06:19
Message-ID: CAA4eK1+7BnXQBDHUsQpDsF4gTenzVe1k3RUxXkxEF=vKRVKajQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
> <khannashubham1197(at)gmail(dot)com> wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> > <dangwalrajendra888(at)gmail(dot)com> wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking such feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated.
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?
>

The cost is one but the other is the user may not want the data to be
different based on volatile functions like timeofday() or the table on
subscriber won't have the column marked as generated. Now, considering
such use cases, is providing a subscription-level option a good idea
as the patch is doing? I understand that this can serve the purpose
but it could also lead to having the same behavior for all the tables
in all the publications for a subscription which may or may not be
what the user expects. This could lead to some performance overhead
(due to always sending generated columns for all the tables) for cases
where the user needs it only for a subset of tables.

I think we should consider it as a table-level option while defining
publication in some way. A few ideas could be: (a) We ask users to
explicitly mention the generated column in the columns list while
defining publication. This has a drawback such that users need to
specify the column list even when all columns need to be replicated.
(b) We can have some new syntax to indicate the same like: CREATE
PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
could be some challenges but we can at least investigate it.

Yet another idea is to keep this as a publication option
(include_generated_columns or publish_generated_columns) similar to
"publish_via_partition_root". Normally, "publish_via_partition_root"
is used when tables on either side have different partition
hierarchies which is somewhat the case here.

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-08-28 06:13:15 Re: Collect statistics about conflicts in logical replication
Previous Message shveta malik 2024-08-28 05:53:01 Re: Conflict detection and logging in logical replication