Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-22 05:54:52
Message-ID: CAA4eK1JNRE1dQR_xQT-2pFFHMTXzb=Cf68Dw3N_5swvrz0D8tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 22, 2021 at 9:23 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Tue, Dec 21, 2021 at 2:29 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Here is the v51* patch set:
> >
>
> I tweaked the query slightly based on Euler's changes, the explain
> analyze of the updated query based on Euler's suggestions, existing
> query and Euler's query is given below:
> 1) updated query based on Euler's suggestion:
> explain analyze SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM
> pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid =
> pr.prpubid) WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
> AND NOT (select bool_or(puballtables) FROM pg_publication
> WHERE pubname in ( 'pub1' )) AND NOT EXISTS (SELECT 1 FROM
> pg_publication_namespace pn, pg_class c WHERE c.oid = 16384 AND
> c.relnamespace = pn.pnnspid);
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
> Unique (cost=14.68..14.69 rows=1 width=32) (actual time=0.121..0.126
> rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
> time=0.025..0.026 rows=1 loops=1)
> -> Seq Scan on pg_publication (cost=0.00..1.96 rows=1
> width=1) (actual time=0.016..0.017 rows=1 loops=1)
> Filter: (pubname = 'pub1'::name)
> InitPlan 2 (returns $1)
> -> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual
> time=0.002..0.003 rows=0 loops=1)
> Join Filter: (pn.pnnspid = c.relnamespace)
> -> Seq Scan on pg_publication_namespace pn
> (cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.002 rows=0
> loops=1)
> -> Index Scan using pg_class_oid_index on pg_class c
> (cost=0.27..8.29 rows=1 width=4) (never executed)
> Index Cond: (oid = '16384'::oid)
> -> Sort (cost=4.40..4.41 rows=1 width=32) (actual
> time=0.119..0.121 rows=1 loops=1)
> Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
> Sort Method: quicksort Memory: 25kB
> -> Result (cost=0.00..4.39 rows=1 width=32) (actual
> time=0.094..0.098 rows=1 loops=1)
> One-Time Filter: ((NOT $0) AND (NOT $1))
> -> Nested Loop (cost=0.00..4.39 rows=1 width=36)
> (actual time=0.013..0.015 rows=1 loops=1)
> Join Filter: (p.oid = pr.prpubid)
> -> Seq Scan on pg_publication p
> (cost=0.00..1.96 rows=1 width=4) (actual time=0.004..0.005 rows=1
> loops=1)
> Filter: (pubname = 'pub1'::name)
> -> Seq Scan on pg_publication_rel pr
> (cost=0.00..2.41 rows=1 width=40) (actual time=0.005..0.005 rows=1
> loops=1)
> Filter: (prrelid = '16384'::oid)
> Planning Time: 1.014 ms
> Execution Time: 0.259 ms
> (24 rows)
>
> 2) Existing query:
> postgres=# explain analyze SELECT DISTINCT pg_get_expr(prqual,
> prrelid) FROM pg_publication p
> INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE
> pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
> AND NOT (select bool_or(puballtables) FROM pg_publication
> WHERE pubname in ( 'pub1' ))
> AND (SELECT count(1)=0 FROM pg_publication_namespace pn,
> pg_class c WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid);
> QUERY
> PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
> Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.162..0.166
> rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
> time=0.023..0.025 rows=1 loops=1)
> -> Seq Scan on pg_publication (cost=0.00..1.96 rows=1
> width=1) (actual time=0.014..0.016 rows=1 loops=1)
> Filter: (pubname = 'pub1'::name)
> InitPlan 2 (returns $1)
> -> Aggregate (cost=8.30..8.32 rows=1 width=1) (actual
> time=0.044..0.045 rows=1 loops=1)
> -> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual
> time=0.028..0.029 rows=0 loops=1)
> Join Filter: (pn.pnnspid = c.relnamespace)
> -> Seq Scan on pg_publication_namespace pn
> (cost=0.00..0.00 rows=1 width=4) (actual time=0.004..0.004 rows=0
> loops=1)
> -> Index Scan using pg_class_oid_index on pg_class c
> (cost=0.27..8.29 rows=1 width=4) (never executed)
> Index Cond: (oid = '16384'::oid)
> -> Sort (cost=4.40..4.41 rows=1 width=32) (actual
> time=0.159..0.161 rows=1 loops=1)
> Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
> Sort Method: quicksort Memory: 25kB
> -> Result (cost=0.00..4.39 rows=1 width=32) (actual
> time=0.142..0.147 rows=1 loops=1)
> One-Time Filter: ((NOT $0) AND $1)
> -> Nested Loop (cost=0.00..4.39 rows=1 width=36)
> (actual time=0.016..0.018 rows=1 loops=1)
> Join Filter: (p.oid = pr.prpubid)
> -> Seq Scan on pg_publication p
> (cost=0.00..1.96 rows=1 width=4) (actual time=0.007..0.009 rows=1
> loops=1)
> Filter: (pubname = 'pub1'::name)
> -> Seq Scan on pg_publication_rel pr
> (cost=0.00..2.41 rows=1 width=40) (actual time=0.004..0.004 rows=1
> loops=1)
> Filter: (prrelid = '16384'::oid)
> Planning Time: 0.966 ms
> Execution Time: 0.327 ms
> (25 rows)
>
> 3) Euler’s Query:
> explain analyze SELECT DISTINCT pg_catalog.pg_get_expr(pr.prqual,
> pr.prrelid) FROM pg_catalog.pg_publication p
> INNER JOIN pg_catalog.pg_publication_rel pr ON (p.oid = pr.prpubid)
> WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
> AND NOT (SELECT pg_catalog.bool_or(b.puballtables) FROM
> pg_catalog.pg_publication b WHERE b.pubname IN ( 'pub1' ))
> AND NOT EXISTS( SELECT 1 FROM
> pg_catalog.pg_publication_namespace pn INNER JOIN
> pg_catalog.pg_class c ON (pn.pnnspid = c.relnamespace) WHERE
> c.oid = pr.prrelid)
> ;
>
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.231..0.236
> rows=1 loops=1)
> InitPlan 1 (returns $0)
> -> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
> time=0.031..0.032 rows=1 loops=1)
> -> Seq Scan on pg_publication b (cost=0.00..1.96 rows=1
> width=1) (actual time=0.019..0.021 rows=1 loops=1)
> Filter: (pubname = 'pub1'::name)
> -> Sort (cost=12.71..12.72 rows=1 width=32) (actual
> time=0.228..0.231 rows=1 loops=1)
> Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
> Sort Method: quicksort Memory: 25kB
> -> Result (cost=0.27..12.70 rows=1 width=32) (actual
> time=0.205..0.210 rows=1 loops=1)
> One-Time Filter: (NOT $0)
> -> Nested Loop (cost=0.27..12.70 rows=1 width=36)
> (actual time=0.103..0.107 rows=1 loops=1)
> Join Filter: (pr.prpubid = p.oid)
> -> Nested Loop Anti Join (cost=0.27..10.73
> rows=1 width=40) (actual time=0.093..0.096 rows=1 loops=1)
> Join Filter: (c.oid = pr.prrelid)
> -> Seq Scan on pg_publication_rel pr
> (cost=0.00..2.41 rows=1 width=40) (actual time=0.008..0.009 rows=1
> loops=1)
> Filter: (prrelid = '16384'::oid)
> -> Nested Loop (cost=0.27..8.30 rows=1
> width=4) (actual time=0.079..0.080 rows=0 loops=1)
> Join Filter: (pn.pnnspid = c.relnamespace)
> -> Index Scan using
> pg_class_oid_index on pg_class c (cost=0.27..8.29 rows=1 width=8)
> (actual time=0.069..0.072 rows=1 loops=1)
> Index Cond: (oid = '16384'::oid)
> -> Seq Scan on
> pg_publication_namespace pn (cost=0.00..0.00 rows=1 width=4) (actual
> time=0.005..0.005 rows=0 loops=1)
> -> Seq Scan on pg_publication p
> (cost=0.00..1.96 rows=1 width=4) (actual time=0.007..0.007 rows=1
> loops=1)
> Filter: (pubname = 'pub1'::name)
> Planning Time: 1.067 ms
> Execution Time: 0.431 ms
> (25 rows)
>
> Combining existing query to include NOT EXISTS based on Euler's
> changes seems to be better:
> SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM pg_publication p
> INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid)
> WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
> AND NOT (select bool_or(puballtables)
> FROM pg_publication
> WHERE pubname in ( 'pub1' ))
> AND NOT EXISTS (SELECT 1
> FROM pg_publication_namespace pn, pg_class c
> WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid);
>

The modified query proposed by you seems better to me based on time.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-12-22 06:13:27 Re: In-placre persistance change of a relation
Previous Message Greg Stark 2021-12-22 05:46:21 Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output