Re: pg_get_publication_tables() output duplicate relid

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_get_publication_tables() output duplicate relid
Date: 2021-11-15 10:16:48
Message-ID: CALj2ACX69Pr5HoEfifB_=cjSz-G=eH7MV6L0sjN_YWtwuJX-yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2021 at 1:48 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi hackers,
>
> In another thread[1], we found the pg_get_publication_tables function will output
> duplicate partition relid when adding both child and parent table to the
> publication(pubviaroot = false).
>
> Example:
> create table tbl1 (a int) partition by range (a);
> create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> create publication pub for table
> tbl1, tbl1_part1 with (publish_via_partition_root=false);
>
> select * from pg_get_publication_tables('pub');
> relid
> -------
> 16387
> 16390
> 16387
>
> The reason of the duplicate output is that:
>
> pg_get_publication_tables invokes function GetPublicationRelations internally.
> In GetPublicationRelations(), it will add the oid of partition 'tbl1_part1'
> twice. First time from extracting partitions from the specified parent table
> 'tbl1', second time from the explicitly specified partition 'tbl1_part1'.
>
>
> I am not sure is this behavior expected as it seems natural for
> pg_get_publication_tables to return duplicate-free relid list. OTOH, there
> seems no harm for the current behavior(duplicate output), it doesn't affect the
> initial sync and change replication when using logical replication.
>
> Personally, I think it might be better to make the output of
> pg_get_publication_tables duplicate-free, because the change happened on each
> output relid will only be replicated once. So, it seems more consistent to
> output each relid only once.
>
> Thoughts ?
>
> (Attach a patch which make the output duplicate-free)
>
> [1] https://www.postgresql.org/message-id/CAJcOf-eURu03QNmD%3D37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw%40mail.gmail.com

The users can always specify the distinct clause, see [1]. I don't see
any problem with the existing way. If required you can specify in the
view pg_publication_tables documentation that "This view returns
multiple rows for the same relation, if the relation is child to a
parent partition table and if both the parent and child are specified
in the publication" or something similart.

If at all, the pg_get_publication_tables() is supposed to give unique
outputs, let's fix it and it is a good idea to specify that the view
pg_publication_tables returns unique rows even though child and parent
partition tables are specified in the publication.

I have fdw comments on the patch:
1) Why to do result = list_concat_unique_oid(result, relids); within
the while loop which can make the while loop O(n*n*n)
complexity(list_concat_unique_oid is O(n*n)? Can't you use
list_append_unique, of course this can also be costlier? Let's avoid
adding anything to the while loop and use
list_sort+list_deduplicate_oid (qsort(O(nlogn)+O(n))

/* Now sort and de-duplicate the result list */
list_sort(result, list_oid_cmp);
list_deduplicate_oid(result);

while (HeapTupleIsValid(tup = systable_getnext(scan)))
{
Form_pg_publication_rel pubrel;
+ List *relids = NIL;

pubrel = (Form_pg_publication_rel) GETSTRUCT(tup);
- result = GetPubPartitionOptionRelations(result, pub_partopt,
+ relids = GetPubPartitionOptionRelations(relids, pub_partopt,
pubrel->prrelid);
+
+ result = list_concat_unique_oid(result, relids);
}

2) Are you sure you want GetPublicationRelations to be returning the
unique relations? I'm just thinking why can't you just do
list_sort+list_deduplicate_oid in the caller? I think this makes the
function more restrictive. If required, you can pass a boolean flag
(bool give_unique and specify in the function comments and if passed
true do list_sort+list_deduplicate_oid at the end of
GetPublicationRelations ).

[1]
postgres=# select * from pg_get_publication_tables('pub');
relid
-------
16387
16390
16387
(3 rows)

postgres=# select distinct * from pg_get_publication_tables('pub');
relid
-------
16387
16390
(2 rows)

postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+------------
pub | public | tbl1_part1
pub | public | tbl1_part2
pub | public | tbl1_part1
(3 rows)

postgres=# select distinct * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+------------
pub | public | tbl1_part1
pub | public | tbl1_part2
(2 rows)

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-11-15 10:31:41 Re: row filtering for logical replication
Previous Message Daniel Gustafsson 2021-11-15 10:16:24 Re: Partial aggregates pushdown