Re: Added schema level support for publication.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Added schema level support for publication.
Date: 2024-12-16 05:57:34
Message-ID: CAA4eK1+ZYanA51c9NzKM31AqJSw-j0-edGz91+Vh-nsoKdzKfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 14, 2024 at 5:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I suppose this exchange is what led to this bit in
> getPublicationNamespaces:
>
> /*
> * We always dump publication namespaces unless the corresponding
> * namespace is excluded from the dump.
> */
> if (nspinfo->dobj.dump == DUMP_COMPONENT_NONE)
> continue;
>
> I'd like to push back against this on three separate grounds:
>
>
> 1. The behavior this produces is extremely non-obvious and not
> adequately explained by the comment, which makes one wonder how
> much of it was intended. For example:
>
> * The public schema will be included if listed in FOR ALL TABLES IN,
> even though it's not dumped explicitly in the dump, because its dump
> mask includes other bits besides DUMP_COMPONENT_DEFINITION. OK, that
> was probably intentional, but you wouldn't know it from the comment.
>
> * Schemas created within extensions will be included if listed in FOR
> ALL TABLES IN, even though they're not dumped explicitly in the dump.
> This seems like a quite accidental by-product of the fact that
> checkExtensionMembership will set DUMP_COMPONENT_ACL on extension
> member objects, thus making their dump mask not NONE. If this
> behavior was intentional, it needs a less-fragile implementation.
>
> * The information_schema will NOT be included, even if it was listed in
> FOR ALL TABLES IN. Admittedly, information_schema doesn't normally
> contain any tables that'd be useful to publish. But still, this seems
> like randomly ignoring the user's intent.
>
>
> 2. The complaint was that if a schema is excluded from the dump
> by --exclude-schema, then it should not get included in the
> publication either. I think this is at best highly debatable:
> arguably it amounts to breaking the publication. It seems
> analogous to deciding that if a function is excluded from the
> dump, while a view using the function is included, we should
> silently adjust the view by removing the output columns or
> WHERE clauses that use the function. I'm pretty sure that
> nobody would think that was sane. Perhaps there's a case for
> excluding the view as a whole, but we don't do that. Besides, the
> corresponding behavior would be to exclude the whole publication,
> not silently modify its definition.
>
>
> 3. The corresponding test for individual tables listed in
> a publication is coded differently:
>
> /*
> * Ignore publication membership of tables whose definitions are not
> * to be dumped.
> */
> if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
> continue;
>
> This is considerably easier to understand the effects of than a test
> on the whole dump mask: it will list the table if we intend to emit
> CREATE TABLE, and not otherwise, regardless of side issues like ACLs.
> But why is it different from the code for schemas?
>
>
> So I think that this was just wrongly thought through. My
> preference would be to either delete the above-quoted bit in
> getPublicationNamespaces entirely, or make it look like the
> test in getPublicationTables. Or maybe we should delete
> both of these tests, on the grounds that redefining the
> contents of the publication is far outside pg_dump's charter.
>

I see a merit in your second suggestion which is to delete these tests
in getPublicationTables() and getPublicationNamespaces() because we
follow similar behavior in the somewhat related subscription case as
well. When a subscription points to a set of publications and we use
'--no-publications' option in pg_dump, it still dumps the
subscription. I tried it with the following test:

Publisher:
postgres=# create schema s1;
CREATE SCHEMA
postgres=# create table t1(c1 int);
CREATE TABLE
postgres=# create publication pub1 for table t1;
CREATE PUBLICATION
postgres=# create publication pub2 for tables in schema s1;
CREATE PUBLICATION

Subscriber:
postgres=# create table t1 (c1 int);
CREATE TABLE
postgres=# create publication pub_on_sub_1 for table t1;
CREATE PUBLICATION
postgres=# create subscription sub1 connection 'dbname = postgres'
publication pub1, pub2;
NOTICE: created replication slot "sub1" on publisher
CREATE SUBSCRIPTION

Now when I performed the dump with '--no-publications' option on the
subscriber node, it didn't include publications which is expected but
did include a subscription definition pointing to the publications as
defined originally.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-16 06:15:27 Re: Pass ParseState as down to utility functions.
Previous Message Tom Lane 2024-12-16 05:23:14 Re: Regression tests fail on OpenBSD due to low semmns value