Re: Added schema level support for publication.

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Added schema level support for publication.
Date: 2024-12-18 07:21:24
Message-ID: CALDaNm1TQqBC5ZP5BsNf2LKVu1kEJNJn2spFwbAtyLn1FoAFGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 16 Dec 2024 at 11:27, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sat, Dec 14, 2024 at 5:28 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > I suppose this exchange is what led to this bit in
> > getPublicationNamespaces:
> >
> > /*
> > * We always dump publication namespaces unless the corresponding
> > * namespace is excluded from the dump.
> > */
> > if (nspinfo->dobj.dump == DUMP_COMPONENT_NONE)
> > continue;
> >
> > I'd like to push back against this on three separate grounds:
> >
> >
> > 1. The behavior this produces is extremely non-obvious and not
> > adequately explained by the comment, which makes one wonder how
> > much of it was intended. For example:
> >
> > * The public schema will be included if listed in FOR ALL TABLES IN,
> > even though it's not dumped explicitly in the dump, because its dump
> > mask includes other bits besides DUMP_COMPONENT_DEFINITION. OK, that
> > was probably intentional, but you wouldn't know it from the comment.
> >
> > * Schemas created within extensions will be included if listed in FOR
> > ALL TABLES IN, even though they're not dumped explicitly in the dump.
> > This seems like a quite accidental by-product of the fact that
> > checkExtensionMembership will set DUMP_COMPONENT_ACL on extension
> > member objects, thus making their dump mask not NONE. If this
> > behavior was intentional, it needs a less-fragile implementation.
> >
> > * The information_schema will NOT be included, even if it was listed in
> > FOR ALL TABLES IN. Admittedly, information_schema doesn't normally
> > contain any tables that'd be useful to publish. But still, this seems
> > like randomly ignoring the user's intent.
> >
> >
> > 2. The complaint was that if a schema is excluded from the dump
> > by --exclude-schema, then it should not get included in the
> > publication either. I think this is at best highly debatable:
> > arguably it amounts to breaking the publication. It seems
> > analogous to deciding that if a function is excluded from the
> > dump, while a view using the function is included, we should
> > silently adjust the view by removing the output columns or
> > WHERE clauses that use the function. I'm pretty sure that
> > nobody would think that was sane. Perhaps there's a case for
> > excluding the view as a whole, but we don't do that. Besides, the
> > corresponding behavior would be to exclude the whole publication,
> > not silently modify its definition.
> >
> >
> > 3. The corresponding test for individual tables listed in
> > a publication is coded differently:
> >
> > /*
> > * Ignore publication membership of tables whose definitions are not
> > * to be dumped.
> > */
> > if (!(tbinfo->dobj.dump & DUMP_COMPONENT_DEFINITION))
> > continue;
> >
> > This is considerably easier to understand the effects of than a test
> > on the whole dump mask: it will list the table if we intend to emit
> > CREATE TABLE, and not otherwise, regardless of side issues like ACLs.
> > But why is it different from the code for schemas?
> >
> >
> > So I think that this was just wrongly thought through. My
> > preference would be to either delete the above-quoted bit in
> > getPublicationNamespaces entirely, or make it look like the
> > test in getPublicationTables. Or maybe we should delete
> > both of these tests, on the grounds that redefining the
> > contents of the publication is far outside pg_dump's charter.
> >
>
> I see a merit in your second suggestion which is to delete these tests
> in getPublicationTables() and getPublicationNamespaces() because we
> follow similar behavior in the somewhat related subscription case as
> well. When a subscription points to a set of publications and we use
> '--no-publications' option in pg_dump, it still dumps the
> subscription.

Currently, the table and schema publications for the
information_schema are not included in the dump. As a result, the
information_schema contents are not published after restoring the
dump. This issue was pointed out by Tom Lane in [1].
Here’s an example to demonstrate the problem:
-- Create table in information_schema
CREATE TABLE information_schema.t1(c1 int);
-- Create a table publication whose table is in information_schema
CREATE PUBLICATION pub1 FOR TABLE information_schema.t1;
-- Create a schema publication on information_schema
CREATE PUBLICATION pub2 FOR TABLES IN SCHEMA information_schema ;

When performing a pg_dump, the following information_schema table and
schema statemetns are not included in the dump, though they should be:
ALTER PUBLICATION pub1 ADD TABLE ONLY information_schema.t1;
ALTER PUBLICATION pub2 ADD TABLES IN SCHEMA information_schema;

The issue arises because we explicitly set the dump bitmask to
DUMP_COMPONENT_NONE for the information_schema schema in
selectDumpableNamespace, which also affects the schema's tables.
Later, the checks in the functions getPublicationNamespaces and
getPublicationTables identify that the bitmask is not set, causing
these publication entries to be skipped.
This issue is fixed by removing the schema and table dump bitmask
check, which ensures that information_schema publications are included
in the dump. Additionally, this patch aligns the behavior with the
following scenarios: a)Subscriptions include publications even when
--no-publication is specified, as Amit pointed out in [2]. b) Views
include the dump of a user-defined function, which is not dumped by
default, as mentioned in [3].
The attached patch has the changes for the same.

[1] - https://www.postgresql.org/message-id/1270733.1734134272%40sss.pgh.pa.us
[2] - https://www.postgresql.org/message-id/CAA4eK1%2BZYanA51c9NzKM31AqJSw-j0-edGz91%2BVh-nsoKdzKfQ%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CALDaNm1ZHfZ9ET9fJxhLWCcSr0-hhi3R_sEupoLPzAWRLngujw%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
v1-0001-Include-information_schema-publications-and-exclu.patch text/x-patch 4.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-12-18 07:26:31 Re: Final result (display) collation?
Previous Message John Naylor 2024-12-18 07:12:13 Re: Fix crash when non-creator being an iteration on shared radix tree