Re: Added schema level support for publication.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Added schema level support for publication.
Date: 2024-12-13 23:57:52
Message-ID: 1270733.1734134272@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[ reviving one aspect of an old thread ]

vignesh C <vignesh21(at)gmail(dot)com> writes:
> On Mon, Jul 19, 2021 at 9:32 AM tanghy(dot)fnst(at)fujitsu(dot)com <
> tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
>> I tested your v12 patch and found a problem in the following case.
>>
>> Step 1:
>> postgres=# create schema s1;
>> CREATE SCHEMA
>> postgres=# create table s1.t1 (a int);
>> CREATE TABLE
>> postgres=# create publication pub_t for table s1.t1;
>> CREATE PUBLICATION
>> postgres=# create publication pub_s for schema s1;
>> CREATE PUBLICATION
>>
>> Step 2:
>> pg_dump -N s1
>>
>> I dumped and excluded schema s1, pg_dump generated the following SQL:
>> -------------------------------
>> ALTER PUBLICATION pub_s ADD SCHEMA s1;
>>
>> I think it was not expected because SQL like "ALTER PUBLICATION pub_t ADD
> TABLE s1.t1" was not generated in my case. Thoughts?

> Thanks for reporting this issue, this issue is fixed in the v13 patch

I suppose this exchange is what led to this bit in
getPublicationNamespaces:

/*
* We always dump publication namespaces unless the corresponding
* namespace is excluded from the dump.
*/
if (nspinfo->dobj.dump == DUMP_COMPONENT_NONE)
continue;

I'd like to push back against this on three separate grounds:

1. The behavior this produces is extremely non-obvious and not
adequately explained by the comment, which makes one wonder how
much of it was intended. For example:

* The public schema will be included if listed in FOR ALL TABLES IN,
even though it's not dumped explicitly in the dump, because its dump
mask includes other bits besides DUMP_COMPONENT_DEFINITION. OK, that
was probably intentional, but you wouldn't know it from the comment.

* Schemas created within extensions will be included if listed in FOR
ALL TABLES IN, even though they're not dumped explicitly in the dump.
This seems like a quite accidental by-product of the fact that
checkExtensionMembership will set DUMP_COMPONENT_ACL on extension
member objects, thus making their dump mask not NONE. If this
behavior was intentional, it needs a less-fragile implementation.

* The information_schema will NOT be included, even if it was listed in
FOR ALL TABLES IN. Admittedly, information_schema doesn't normally
contain any tables that'd be useful to publish. But still, this seems
like randomly ignoring the user's intent.

2. The complaint was that if a schema is excluded from the dump
by --exclude-schema, then it should not get included in the
publication either. I think this is at best highly debatable:
arguably it amounts to breaking the publication. It seems
analogous to deciding that if a function is excluded from the
dump, while a view using the function is included, we should
silently adjust the view by removing the output columns or
WHERE clauses that use the function. I'm pretty sure that
nobody would think that was sane. Perhaps there's a case for
excluding the view as a whole, but we don't do that. Besides, the
corresponding behavior would be to exclude the whole publication,
not silently modify its definition.

3. The corresponding test for individual tables listed in
a publication is coded differently:

/*
* Ignore publication membership of tables whose definitions are not
* to be dumped.
*/
if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
continue;

This is considerably easier to understand the effects of than a test
on the whole dump mask: it will list the table if we intend to emit
CREATE TABLE, and not otherwise, regardless of side issues like ACLs.
But why is it different from the code for schemas?

So I think that this was just wrongly thought through. My
preference would be to either delete the above-quoted bit in
getPublicationNamespaces entirely, or make it look like the
test in getPublicationTables. Or maybe we should delete
both of these tests, on the grounds that redefining the
contents of the publication is far outside pg_dump's charter.

BTW, the discussion that caused me to notice this is at [1].
I'd come to the conclusion that doing something on the basis of
"dobj->dump == DUMP_COMPONENT_NONE" is probably an anti-pattern.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAKNkYnwXFBf136%3Du9UqUxFUVagevLQJ%3DzGd5BsLhCsatDvQsKQ%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-14 00:06:16 Re: Exceptional md.c paths for recovery and zero_damaged_pages
Previous Message Andres Freund 2024-12-13 23:44:22 Exceptional md.c paths for recovery and zero_damaged_pages