From: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
---|---|
To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Data is copied twice when specifying both child and parent table in publication |
Date: | 2021-10-20 09:00:18 |
Message-ID: | CAJcOf-fHq5Mca2sf7MqckwkXGLfjqiKboKsDNnywC-jnvM_BBQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 20, 2021 at 7:02 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> > Actually, at least with the scenario I gave steps for, after looking
> > at it again and debugging, I think that the behavior is understandable
> > and not a bug.
> > The reason is that the INSERTed data is first published though the
> > partitions, since initially there is no partitioned table in the
> > publication (so publish_via_partition_root=true doesn't have any
> > effect). But then adding the partitioned table to the publication and
> > refreshing the publication in the subscriber, the data is then
> > published "using the identity and schema of the partitioned table" due
> > to publish_via_partition_root=true. Note that the corresponding table
> > in the subscriber may well be a non-partitioned table (or the
> > partitions arranged differently) so the data does need to be
> > replicated again.
>
> I don't think this behavior is consistent, I mean for the initial sync
> we will replicate the duplicate data, whereas for later streaming we
> will only replicate it once. From the user POW, this behavior doesn't
> look correct.
>
The scenario I gave steps for didn't have any table data when the
subscription was made, so the initial sync did not replicate any data.
I was referring to the double-publish that occurs when
publish_via_partition_root=true and then the partitioned table is
added to the publication and the subscriber does ALTER SUBSCRIPTION
... REFRESH PUBLICATION.
If I modify my example to include both the partitioned table and
(explicitly) its child partitions in the publication, and insert some
data on the publisher side prior to the subscription, then I am seeing
duplicate data on the initial sync on the subscriber side, and I would
agree that this doesn't seem correct.
Regards,
Greg Nancarrow
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-10-20 09:17:45 | Re: LogicalChanges* and LogicalSubxact* wait events are never reported |
Previous Message | Amit Kapila | 2021-10-20 08:59:22 | Re: Data is copied twice when specifying both child and parent table in publication |