Re: Added schema level support for publication.

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Added schema level support for publication.
Date: 2024-12-16 11:04:45
Message-ID: CALDaNm2FY6zFBqEZR+yHX7i=jbjnXZeWa_QOpFsq_+Ghf_CJng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 14 Dec 2024 at 05:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> [ reviving one aspect of an old thread ]
>
> vignesh C <vignesh21(at)gmail(dot)com> writes:
> > On Mon, Jul 19, 2021 at 9:32 AM tanghy(dot)fnst(at)fujitsu(dot)com <
> > tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> >> I tested your v12 patch and found a problem in the following case.
> >>
> >> Step 1:
> >> postgres=# create schema s1;
> >> CREATE SCHEMA
> >> postgres=# create table s1.t1 (a int);
> >> CREATE TABLE
> >> postgres=# create publication pub_t for table s1.t1;
> >> CREATE PUBLICATION
> >> postgres=# create publication pub_s for schema s1;
> >> CREATE PUBLICATION
> >>
> >> Step 2:
> >> pg_dump -N s1
> >>
> >> I dumped and excluded schema s1, pg_dump generated the following SQL:
> >> -------------------------------
> >> ALTER PUBLICATION pub_s ADD SCHEMA s1;
> >>
> >> I think it was not expected because SQL like "ALTER PUBLICATION pub_t ADD
> > TABLE s1.t1" was not generated in my case. Thoughts?
>
> > Thanks for reporting this issue, this issue is fixed in the v13 patch
>
> I suppose this exchange is what led to this bit in
> getPublicationNamespaces:
>
> /*
> * We always dump publication namespaces unless the corresponding
> * namespace is excluded from the dump.
> */
> if (nspinfo->dobj.dump == DUMP_COMPONENT_NONE)
> continue;
>
> I'd like to push back against this on three separate grounds:
>
>
> 1. The behavior this produces is extremely non-obvious and not
> adequately explained by the comment, which makes one wonder how
> much of it was intended. For example:
>
> * The public schema will be included if listed in FOR ALL TABLES IN,
> even though it's not dumped explicitly in the dump, because its dump
> mask includes other bits besides DUMP_COMPONENT_DEFINITION. OK, that
> was probably intentional, but you wouldn't know it from the comment.
>
> * Schemas created within extensions will be included if listed in FOR
> ALL TABLES IN, even though they're not dumped explicitly in the dump.
> This seems like a quite accidental by-product of the fact that
> checkExtensionMembership will set DUMP_COMPONENT_ACL on extension
> member objects, thus making their dump mask not NONE. If this
> behavior was intentional, it needs a less-fragile implementation.
>
> * The information_schema will NOT be included, even if it was listed in
> FOR ALL TABLES IN. Admittedly, information_schema doesn't normally
> contain any tables that'd be useful to publish. But still, this seems
> like randomly ignoring the user's intent.
>
>
> 2. The complaint was that if a schema is excluded from the dump
> by --exclude-schema, then it should not get included in the
> publication either. I think this is at best highly debatable:
> arguably it amounts to breaking the publication. It seems
> analogous to deciding that if a function is excluded from the
> dump, while a view using the function is included, we should
> silently adjust the view by removing the output columns or
> WHERE clauses that use the function. I'm pretty sure that
> nobody would think that was sane. Perhaps there's a case for
> excluding the view as a whole, but we don't do that. Besides, the
> corresponding behavior would be to exclude the whole publication,
> not silently modify its definition.
>
>
> 3. The corresponding test for individual tables listed in
> a publication is coded differently:
>
> /*
> * Ignore publication membership of tables whose definitions are not
> * to be dumped.
> */
> if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
> continue;
>
> This is considerably easier to understand the effects of than a test
> on the whole dump mask: it will list the table if we intend to emit
> CREATE TABLE, and not otherwise, regardless of side issues like ACLs.
> But why is it different from the code for schemas?
>
>
> So I think that this was just wrongly thought through. My
> preference would be to either delete the above-quoted bit in
> getPublicationNamespaces entirely, or make it look like the
> test in getPublicationTables. Or maybe we should delete
> both of these tests, on the grounds that redefining the
> contents of the publication is far outside pg_dump's charter.

We cannot keep the code identical for getPublicationNamespaces and
getPublicationTables because selectDumpableNamespace performs special
handling for the public schema. Specifically, it unsets
DUMP_COMPONENT_DEFINITION for the public namespace, which prevents the
inclusion of 'TABLES IN SCHEMA public' in the publication. That is the
reason we did not keep the code similar to getPublicationTables.
I prefer the other approach to remove both the checks in
getPublicationTables() and getPublicationNamespaces() which also makes
it consistent with the other case that Amit mentioned at [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BZYanA51c9NzKM31AqJSw-j0-edGz91%2BVh-nsoKdzKfQ%40mail.gmail.com

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Enes Özdeniz 2024-12-16 11:12:32 Query optimization about cube
Previous Message Amit Kapila 2024-12-16 10:56:38 Re: DOCS: pg_createsubscriber wrong link?