From: | "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: Data is copied twice when specifying both child and parent table in publication |
Date: | 2022-04-21 03:05:13 |
Message-ID: | OS3PR01MB6275F9606924FC06F555B1999EF49@OS3PR01MB6275.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 19, 2022 4:53 PM Shi, Yu/侍 雨 <shiy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> On Tue, Apr 19, 2022 3:05 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > > -----Original Message-----
> > > From: Wang, Wei/王 威 <wangw(dot)fnst(at)fujitsu(dot)com>
> > On Thursday, April 7, 2022 11:08 AM
> > >
> > > On Thur, Mar 10, 2021 at 10:08 AM houzj(dot)fnst(at)fujitsu(dot)com wrote:
> > > > Hi,
> > > >
> > > > When reviewing some logical replication related features. I noticed
> another
> > > > possible problem if the subscriber subscribes multiple publications which
> > > > publish parent and child table.
> > > >
> > > > For example:
> > > >
> > > > ----pub
> > > > create table t (a int, b int, c int) partition by range (a);
> > > > create table t_1 partition of t for values from (1) to (10);
> > > >
> > > > create publication pub1 for table t
> > > > with (PUBLISH_VIA_PARTITION_ROOT);
> > > > create publication pub2 for table t_1
> > > > with (PUBLISH_VIA_PARTITION_ROOT);
> > > >
> > > > ----sub
> > > > ---- prepare table t and t_1
> > > > CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres'
> > > > PUBLICATION pub1, pub2;
> > > >
> > > > select * from pg_subscription_rel ;
> > > > srsubid | srrelid | srsubstate | srsublsn
> > > > ---------+---------+------------+-----------
> > > > 16391 | 16385(t) | r | 0/150D100
> > > > 16391 | 16388(t_1) | r | 0/150D138
> > > >
> > > > If subscribe two publications one of them publish parent table with
> > > > (pubviaroot=true) and another publish child table. Both the parent table
> and
> > > > child table will exist in pg_subscription_rel which also means we will do
> > > > initial copy for both tables.
> > > >
> > > > But after initial copy, we only publish change with the schema of the
> parent
> > > > table(t). It looks a bit inconsistent.
> > > >
> > > > Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think
> > > the
> > > > expected behavior could be we only store the top most parent(table t) in
> > > > pg_subscription_rel and do initial copy for it if pubviaroot is on. I haven't
> > > > thought about how to fix this and will investigate this later.
> > > Hi,
> > > I try to fix this bug. Attach the patch.
> > >
> > > The current HEAD get table list for one publication by invoking function
> > > pg_get_publication_tables. If multiple publications are subscribed, then this
> > > function is invoked multiple times. So option
> PUBLISH_VIA_PARTITION_ROOT
> > > works
> > > independently on every publication, I think it does not work correctly on
> > > different publications of the same subscription.
> > >
> > > So I fix this bug by the following two steps:
> > > First step,
> > > I get oids of subscribed tables by publication list. Then for tables with the
> > > same topmost root table, I filter them base on the option
> > > PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids).
> > > After filtering, I get the final oid list.
> > > Second step,
> > > I get the required informations(nspname and relname) base on the oid list
> of
> > > first step.
> >
> > Thanks for updating the patch.
> > I confirmed that the bug is fixed by this patch.
> >
> > One suggestion is that can we simplify the code by moving the logic of
> checking
> > the ancestor into the SQL ?. For example, we could filter the outpout of
> > pg_publication_tables by adding A WHERE clause which checks whether the
> table
> > is a partition and if its ancestor is also in the output. I think we can also
> > filter the needless partition in this approach.
> >
>
> I agreed with you and I tried to fix this problem in a simpler way. What we want
> is to exclude the partitioned table whose ancestor is also need to be
> replicated, so how about implementing that by using the following SQL when
> getting the table list from publisher?
>
> SELECT DISTINCT ns.nspname, c.relname
> FROM pg_catalog.pg_publication_tables t
> JOIN pg_catalog.pg_namespace ns ON ns.nspname = t.schemaname
> JOIN pg_catalog.pg_class c ON c.relname = t.tablename AND c.relnamespace =
> ns.oid
> WHERE t.pubname IN ('p0','p2')
> AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1 FROM
> pg_partition_ancestors(c.oid)
> WHERE relid IN ( SELECT DISTINCT (schemaname||'.'||tablename)::regclass::oid
> FROM pg_catalog.pg_publication_tables t
> WHERE t.pubname IN ('p0','p2') ) AND relid != c.oid));
>
> Please find the attached patch which used this approach, I also merged the test
> in Wang's patch into it.
Thanks for your review and patch.
I think the approach of v2 is better than v1. It does not increase the query.
Only move the test cases from 100_bugs.pl to 013_partition.pl and simplify it.
Attach the new patch.
Regards,
Wang wei
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch | application/octet-stream | 4.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-04-21 03:09:19 | Re: DataRow message for Integer(int4) returns result as text? |
Previous Message | Peter Geoghegan | 2022-04-21 03:00:03 | Re: More problems with VacuumPageHit style global variables |