From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: Data is copied twice when specifying both child and parent table in publication |
Date: | 2021-11-03 10:13:30 |
Message-ID: | CAA4eK1L6XPtxUGsDdWLvQ75tKCoowkUrcSdLjcf9qgOX_-P5ow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Oct 28, 2021 at 1:55 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Thu, Oct 28, 2021 at 4:35 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > As there are basically two separate issues mentioned in the thread, I tried to
> > summarize the discussion so far which might be helpful to others.
> >
> > * The first issue[1]:
> >
> > If we include both the partitioned table and (explicitly) its child partitions
> > in the publication when set publish_via_partition_root=true, like:
> > ---
> > CREATE PUBLICATION pub FOR TABLE parent_table, child_table with (publish_via_partition_root=on);
> > ---
> > It could execute initial sync for both the partitioned(parent_table) table and
> > (explicitly) its child partitions(child_table) which cause duplication of
> > data in partition(child_table) in subscriber side.
> >
> > The reasons I considered this behavior a bug are:
> >
> > a) In this case, the behavior of initial sync is inconsistent with the behavior
> > of transaction streaming. All changes in the leaf the partition will be applied
> > using the identity and schema of the partitioned(root) table. But for the
> > initial sync, it will execute sync for both the partitioned(root) table and
> > (explicitly) its child partitions which cause duplication of data.
> >
> > b) The behavior of FOR TABLE is inconsistent with the behavior of FOR ALL TABLE.
> > If user create a FOR ALL TABLE publication and set publish_via_partition_root=true,
> > then only the top most partitioned(root) table will execute initial sync.
> >
> > IIRC, most people in this thread agreed that the current behavior is not
> > expected. So, maybe it's time to try to fix it.
> >
> > Attach my fix patch here. The patch try to fix this issue by making the
> > pg_publication_tables view only show partitioned table when
> > publish_via_partition_root is true.
> >
> >
> > * The second issue[2]:
> > -----
> > CREATE TABLE sale (sale_date date not null,country_code text, product_sku text,
> > units integer) PARTITION BY RANGE (sale_date);
> > CREATE TABLE sale_201901 PARTITION OF sale FOR VALUES FROM ('2019-01-01') TO
> > ('2019-02-01');
> > CREATE TABLE sale_201902 PARTITION OF sale FOR VALUES FROM ('2019-02-01') TO
> > ('2019-03-01');
> >
> > (1) PUB: CREATE PUBLICATION pub FOR TABLE sale_201901,
> > sale_201902 WITH (publish_via_partition_root=true);
> > (2) SUB: CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost port=5432' PUBLICATION pub;
> > (3) PUB: INSERT INTO sale VALUES('2019-01-01', 'AU', 'cpu', 5), ('2019-01-02', 'AU', 'disk', 8);
> > (4) SUB: SELECT * FROM sale;
> > (5) PUB: ALTER PUBLICATION pub ADD TABLE sale;
> > (6) SUB: ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
> > (7) SUB: SELECT * FROM sale;
> > -----
> >
> > In step (7), we can see duplication of data.
> >
> > The reason is that the INSERTed data is first published though the partitions,
> > since initially there is no partitioned table in the publication (so
> > publish_via_partition_root=true doesn't have any effect). But then adding the
> > partitioned table to the publication and refreshing the publication in the
> > subscriber, the data is then published "using the identity and schema of the
> > partitioned table" due to publish_via_partition_root=true.
> > (Copied from Greg's analysis).
> >
> > Whether this behavior is correct is still under debate.
> >
> >
> > Overall, I think the second issue still needs further discussion while the
> > first issue seems clear that most people think it's unexpected. So, I think it
> > might be better to fix the first issue.
>
> Thanks for the summary, Hou-san, and sorry about my late reply.
>
> I had thought about this some last week and I am coming around to
> recognizing the confusing user experience of the current behavior.
> Though, I am not sure if removing partitions from the result of
> pg_publication_tables view for pubviaroot publications is acceptable
> as a fix, because that prevents replicating into a subscriber node
> where tables that are partition root and a partition respectively on
> the publisher are independent tables on the subscriber.
>
But we already do that way when the publication is "For All Tables".
Anyway, for the purpose of initial sync, it will just replicate the
same data in two different tables if the corresponding tables on the
subscriber-side are non-partitioned which I am not sure is what the
user will be expecting.
> ISTM that
> what Amit K mentioned in his first reply may be closer to what we may
> ultimately need to do, which is this:
>
> "I think one possible idea to investigate is that on the
> subscriber-side, after fetching tables, we check the already
> subscribed tables and if the child tables already exist then we ignore
> the parent table and vice versa."
>
> I had also thought about a way to implement that a bit and part of
> that is to make pg_publication_tables always expose leaf partitions,
> that is, even if pubviaroot is true for a given publication. So, when
> puviaroot is true we include the partition root actually mentioned in
> the publication as we do currently (changes streamed after initial
> sync will always use its schema so the subscriber better know about
> it), and also leaf partitions which we currently don't. The latter
> too so that a subscriber can determine which leaf partitions to sync
> and which ones to not based on whether a given leaf partitions is
> already known to the subscriber, which means their pg_subscription_rel
> entries would need to be made which we don't currently. Leaf
> partition state entries will not be referred to after all of the known
> ones are synced, because the subsequent changes will be streamed with
> the root's schema. The initial sync worker would need to be taught to
> skip processing any partitioned tables that it sees, because now we'd
> be handling the initial sync part via individual leaf partitions.
>
Yeah, we can do something like this as well but my guess is that this
will be a bit complicated. OTOH, if we adopt Hou-san's patch for the
first problem as described in his previous email then I think the
solution for the second problem could be simpler than what you
described above.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2021-11-03 10:16:26 | Re: remove internal support in pgcrypto? |
Previous Message | Vik Fearing | 2021-11-03 10:07:58 | Re: Should we support new definition for Identity column : GENERATED BY DEFAULT ON NULL? |