Re: Pgoutput not capturing the generated columns

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shubham Khanna <khannashubham1197(at)gmail(dot)com>, Rajendra Kumar Dangwal <dangwalrajendra888(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, euler(at)eulerto(dot)com
Subject: Re: Pgoutput not capturing the generated columns
Date: 2024-09-23 12:10:58
Message-ID: CALDaNm0k26WNvHw8fyY5-=i489tze1gzhLo3qEuh+A=ZwqEtqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 20 Sept 2024 at 04:16, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Fri, Sep 20, 2024 at 3:26 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 19, 2024 at 2:32 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> ...
> > > I think that the column list should take priority and we should
> > > publish the generated column if it is mentioned in irrespective of
> > > the option.
> >
> > Agreed.
> >
> > >
> ...
> > >
> > > Users can use a publication like "create publication pub1 for table
> > > t1(c1, c2), t2;" where they want t1's generated column to be published
> > > but not for t2. They can specify the generated column name in the
> > > column list of t1 in that case even though the rest of the tables
> > > won't publish generated columns.
> >
> > Agreed.
> >
> > I think that users can use the publish_generated_column option when
> > they want to publish all generated columns, instead of specifying all
> > the columns in the column list. It's another advantage of this option
> > that it will also include the future generated columns.
> >
>
> OK. Let me give some examples below to help understand this idea.
>
> Please correct me if these are incorrect.
>
> ======
>
> Assuming these tables:
>
> t1(a,b,gen1,gen2)
> t2(c,d,gen1,gen2)
>
> Examples, when publish_generated_columns=false:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=false)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
> t1 -> publishes a, b
> t2 -> publishes c, d
>
> ~~
>
> Examples, when publish_generated_columns=true:
>
> CREATE PUBLICATION pub1 FOR t1(a,b,gen2), t2 WITH
> (publish_generated_columns=true)
> t1 -> publishes a, b, gen2 (e.g. what column list says)
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR t1, t2(gen1) WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes gen1 (e.g. what column list says)
>
> CREATE PUBLICATION pub1 FOR t1, t2 WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
> t1 -> publishes a, b + ALSO gen1, gen2
> t2 -> publishes c, d + ALSO gen1, gen2
>
> ======
>
> The idea LGTM, although now the parameter name
> ('publish_generated_columns') seems a bit misleading since sometimes
> generated columns get published "irrespective of the option".
>
> So, I think the original parameter name 'include_generated_columns'
> might be better here because IMO "include" seems more like "add them
> if they are not already specified", which is exactly what this idea is
> doing.
>
> Thoughts?

I have verified the various scenarios discussed here and the patch
works as expected with v32 version patch shared at [1]:

Test presetup:
-- publisher
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int GENERATED ALWAYS
AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int GENERATED ALWAYS
AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (d * 2) STORED);

-- subscriber
CREATE TABLE t1 (a int PRIMARY KEY, b int, gen1 int, gen2 int);
CREATE TABLE t2 (c int PRIMARY KEY, d int, gen1 int, gen1 int);

Test1: Publisher replicates the column list data including generated
columns even though publish_generated_columns option is false:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | |
(1 row)

--t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
| | 2 |
(1 row)

Test2: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2 WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
1 | 1 | |
(1 row)

Test3: Publisher does not replication gen column if
publish_generated_columns option is false
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=false)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
--t1 -> publishes a, b
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | |
(1 row)

-- t2 -> publishes c, d
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
1 | 1 | |
(1 row)

Test4: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1(a,b,gen2), t2 WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b, gen2 (e.g. what column list says)
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | | 2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

Test5: Publisher publishes only the data of the columns specified in
column list skipping other generated/non-generated columns:
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2(gen1) WITH
(publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

-- t2 -> publishes gen1 (e.g. what column list says)
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
| | 2 |
(1 row)

Test6: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR table t1, t2 WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

Test7: Publisher replicates all columns if publish_generated_columns
is enabled without column list
Publisher:
CREATE PUBLICATION pub1 FOR ALL TABLES WITH (publish_generated_columns=true)
insert into t1 values(1,1);
insert into t2 values(1,1);

Subscriber:
-- t1 -> publishes a, b + ALSO gen1, gen2
subscriber=# select * from t1;
a | b | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

-- t2 -> publishes c, d + ALSO gen1, gen2
subscriber=# select * from t2;
c | d | gen1 | gen2
---+---+------+------
1 | 1 | 2 | 2
(1 row)

[1] - https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-09-23 12:16:45 Re: Pgoutput not capturing the generated columns
Previous Message Peter Eisentraut 2024-09-23 11:58:56 Re: Converting README documentation to Markdown