Re: Data is copied twice when specifying both child and parent table in publication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-20 13:40:53
Message-ID: CAJcOf-d8SWk3z3fJaLW9yuVux=2ESTsXOSdKzCq1O3AWBpgnMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2021 at 9:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I don't see why data need to be replicated again even in that case.
> Can you see any such duplicate data replicated for non-partitioned
> tables?
>

If my example is slightly modified to use the same-named tables on the
subscriber side, but without partitioning, i.e.:

PUB:

CREATE SCHEMA sch;
CREATE SCHEMA sch1;
CREATE TABLE sch.sale (sale_date date not null, country_code text,
product_sku text, units integer) PARTITION BY RANGE (sale_date);
CREATE TABLE sch1.sale_201901 PARTITION OF sch.sale FOR VALUES FROM
('2019-01-01') TO ('2019-02-01');
CREATE TABLE sch1.sale_201902 PARTITION OF sch.sale FOR VALUES FROM
('2019-02-01') TO ('2019-03-01');

SUB:

CREATE SCHEMA sch;
CREATE SCHEMA sch1;
CREATE TABLE sch.sale (sale_date date not null, country_code text,
product_sku text, units integer);
CREATE TABLE sch1.sale_201901 (sale_date date not null, country_code
text, product_sku text, units integer);
CREATE TABLE sch1.sale_201902 (sale_date date not null, country_code
text, product_sku text, units integer);

then the INSERTed data on the publisher side gets replicated to the
subscriber's "sch1.sale_201901" and "sch1.sale_201902" tables (only),
depending on the date values.
Now if the partitioned table is then added to the publication and
ALTER SUBSCRIPTION ... REFRESH PUBLICATION done by the subscriber,
then the current functionality is that the existing sch.sale data is
replicated (only) to the subscriber's "sch.sale" table (even though
data had been replicated previously to the "sch1.sale_201901" and
"sch1.sale_201902" tables, only).
So, just to be clear, you think that this current functionality isn't
correct (i.e. no data should be replicated on the REFRESH in this
case)?
I think it's debatable because here copy_data=true and sch.sale was
not a previously-subscribed table (so pre-existing data in that table
should be copied, in accordance with the current documentation).

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-10-20 13:40:56 Re: ThisTimeLineID can be used uninitialized
Previous Message Alvaro Herrera 2021-10-20 13:39:15 Re: pgsql: Document XLOG_INCLUDE_XID a little better