Re: Pgoutput not capturing the generated columns

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-19 17:25:42
Message-ID: CAD21AoAFAKTed6cGZ+z7WBds09kixDN27cyy9BMxCFSCrNSGaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Sep 17, 2024 at 12:04 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > On Tue, Sep 17, 2024 at 4:15 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Sep 16, 2024 at 8:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > > >
> > > > I thought that the option "publish_generated_columns" is more related
> > > > to "column lists" than "row filters".
> > > >
> > > > Let's say table 't1' has columns 'a', 'b', 'c', 'gen1', 'gen2'.
> > > >
> > >
> > > > And
> > > > PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > is equivalent to
> > > > PUBLICATION pub2 FOR TABLE t1(a,b,c);
> > >
> > > This makes sense to me as it preserves the current behavior.
> > >
> > > > Then:
> > > > PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > is equivalent to
> > > > PUBLICATION pub1 FOR TABLE t1(a,b,c,gen1,gen2);
> > >
> > > This also makes sense. It would also include future generated columns.
> > >
> > > > So, I would expect this to fail because the SUBSCRIPTION docs say
> > > > "Subscriptions having several publications in which the same table has
> > > > been published with different column lists are not supported."
> > >
> > > So I agree that it would raise an error if users subscribe to both
> > > pub1 and pub2.
> > >
> > > And looking back at your examples,
> > >
> > > > > > e.g.1
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR TABLE t1 WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > > > > >
> > > > > > e.g.2
> > > > > > -----
> > > > > > CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns = true);
> > > > > > CREATE PUBLICATION pub2 FOR TABLE t1 WITH (publish_generated_columns = false);
> > > > > > CREATE SUBSCRIPTION sub ... PUBLICATIONS pub1,pub2;
> > > > > > -----
> > >
> > > Both examples would not be supported.
> > >
> > > > > >
> > > > > > The CREATE SUBSCRIPTION docs [1] only says "Subscriptions having
> > > > > > several publications in which the same table has been published with
> > > > > > different column lists are not supported."
> > > > > >
> > > > > > Perhaps the user is supposed to deduce that the example above would
> > > > > > work OK if table 't1' has no generated cols. OTOH, if it did have
> > > > > > generated cols then the PUBLICATION column lists must be different and
> > > > > > therefore it is "not supported" (??).
> > > > >
> > > > > With the patch, how should this feature work when users specify a
> > > > > generated column to the column list and set publish_generated_column =
> > > > > false, in the first place? raise an error (as we do today)? or always
> > > > > send NULL?
> > > >
> > > > For this scenario, I suggested (see [1] #3) that the code could give a
> > > > WARNING. As I wrote up-thread: This combination doesn't seem
> > > > like something a user would do intentionally, so just silently
> > > > ignoring it (which the current patch does) is likely going to give
> > > > someone unexpected results/grief.
> > >
> > > It gives a WARNING, and then publishes the specified generated column
> > > data (even if publish_generated_column = false)?
>
>
> I think that the column list should take priority and we should
> publish the generated column if it is mentioned in irrespective of
> the option.

Agreed.

>
> > > If so, it would mean
> > > that specifying the generated column to the column list means to
> > > publish its data regardless of the publish_generated_column parameter
> > > value.
> > >
> >
> > No. I meant only it can give the WARNING to tell the user user "Hey,
> > there is a conflict here because you said publish_generated_column=
> > false, but you also specified gencols in the column list".
> >
>
> Users can use a publication like "create publication pub1 for table
> t1(c1, c2), t2;" where they want t1's generated column to be published
> but not for t2. They can specify the generated column name in the
> column list of t1 in that case even though the rest of the tables
> won't publish generated columns.

Agreed.

I think that users can use the publish_generated_column option when
they want to publish all generated columns, instead of specifying all
the columns in the column list. It's another advantage of this option
that it will also include the future generated columns.

Given that we publish the generated columns if they are mentioned in
the column list, can we separate the patch into two if it helps
reviews? One is to allow logical replication to publish generated
columns if they are explicitly mentioned in the column list. The
second patch is to introduce the publish_generated_columns option.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-09-19 17:44:32 Re: Should rolpassword be toastable?
Previous Message Robert Haas 2024-09-19 17:23:40 Re: First draft of PG 17 release notes