Re: Added schema level support for publication.

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Added schema level support for publication.
Date: 2021-01-11 06:15:47
Message-ID: CALj2ACWAxO3vSToT0o5nXL=rz5cNx90zaV-at=cvM14Tag4=cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 10, 2021 at 11:21 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > I think this feature can be useful, in case a user has a lot of tables
> > to publish inside a schema. Having said that, I wonder if this feature
> > mandates users to create the same schema with same
> > permissions/authorizations manually on the subscriber, because logical
> > replication doesn't propagate any ddl's so are the schema or schema
> > changes? Or is it that the list of tables from the publisher can go
> > into a different schema on the subscriber?
> >
>
> DDL's will not be propagated to the subscriber. Users have to create
> the schema & tables in the subscriber. No change in
> Permissions/authorizations handling, it will be the same as the
> existing behavior for relations.

Looks like the existing behaviour already requires users to create the
schema on the subscriber when publishing the tables from that schema.
Otherwise, an error is thrown on the subscriber [1].

[1] on publisher:
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
CREATE PUBLICATION testpub FOR TABLE myschema.t1;

on subscriber:
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
ERROR: schema "myschema" does not exist
CREATE SCHEMA myschema;
CREATE TABLE myschema.t1(a1 int, b1 int);
postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
dbname=postgres user=bharath port=5432' PUBLICATION testpub;
NOTICE: created replication slot "testsub" on publisher
CREATE SUBSCRIPTION

> > Since the schema can have other objects such as data types, functions,
> > operators, I'm sure with your feature, non-table objects will be
> > skipped.
> >
>
> Yes, only table data will be sent to subscribers, non-table objects
> will be skipped.

Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
for all the tables in the database, does this i.e. skips non-table
objects and temporary tables, foreign tables and so on. So, your
feature also can behave the same way, but within the scope of the
given schema/s.

> > As Amit pointed out earlier, the behaviour when schema dropped, I
> > think we should also consider when schema is altered, say altered to a
> > different name, maybe we should change that in the publication too.
> >
>
> I agree that when schema is altered the renamed schema should be
> reflected in the publication.

I think, it's not only making sure that the publisher side has the new
altered schema, but also the subscriber needs those alters. Having
said that, since these alters come under DDL changes and in logical
replication we don't publish the scheme changes to the subscriber, we
may not need to anything extra for informing the schema alters to the
subscriber from the publisher, the users might have to do the same
schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; should work for them? If this understanding is
correct, then we should document this.

> > In general, what happens if we have some temporary tables or foreign
> > tables inside the schema, will they be allowed to send the data to
> > subscribers?
> >
>
> Temporary tables & foreign tables will not be added to the publications.

Yes the existing logical replication framework doesn't allow
replication of temporary, unlogged, foreign tables and other non-table
relations such as materialized views, indexes etc [1]. The CREATE
PUBLICATION statement either fails in check_publication_add_relation
or before that.

CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
single table tab1 is any of the above restricted tables, seems fine.
But, if there's a list of tables with CREATE PUBLICATION testpub FOR
TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
unlogged_tab5, normal_tab6, normal_tab7 ......; This query fails on
first encounter of the restricted table, say at temp_tab2. Whereas,
CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
tables and continue to add the accepted tables into the publication
within the database.

IMHO, if there's a list of tables specified with FOR TABLE, then
instead of throwing an error in case of any restricted table, we can
issue a warning and continue with the addition of accepted tables into
the publication. If done, this behaviour will be in sync with FOR ALL
TABLES;

Thoughts? If okay, I can work on a patch.

Related to error messages: when foreign table is specified in CREATE
PUBLICATION statement, then "ERROR: "f1" is not a table", is thrown
[1], how about the error message "ERROR: foerign table "f1" cannot be
replicated". In general, it would be good if we could have the error
messages something like in [2] instead of the existing [1].

Thoughts? If okay, I can work on a patch.

[1]
t1 is a temporary table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR: table "t1" cannot be replicated
DETAIL: Temporary and unlogged relations cannot be replicated.

t1 is an unlogged table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR: table "t1" cannot be replicated
DETAIL: Temporary and unlogged relations cannot be replicated.

f1 is a foreign table:
postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
ERROR: "f1" is not a table
DETAIL: Only tables can be added to publications.

mv1 is a mat view:
postgres=# CREATE PUBLICATION testpub FOR TABLE mv1;
ERROR: "mv1" is not a table

idx1 is an index:
postgres=# CREATE PUBLICATION testpub FOR TABLE idx1;
ERROR: "idx1" is an index

[2]
t1 is a temporary table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR: temporary table "t1" cannot be replicated
DETAIL: Temporary, unlogged and foreign relations cannot be replicated.

t1 is an unlogged table:
postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
ERROR: unlogged table "t1" cannot be replicated
DETAIL: Temporary, unlogged and foreign relations cannot be replicated.

f1 is a foreign table:
postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
ERROR: foreign table "f1" cannot be replicated
DETAIL: Temporary, unlogged and foreign relations cannot be replicated.

> > Say a user has created a publication for a schema with hundreds of
> > tables in it, at some point later, can he stop replicating a single or
> > some tables from that schema?
> >
>
> There is no provision for this currently.

The documentation [1] says, we can ALTER PUBLICATION testpub DROP
TABLE t1; which removes the table from the list of published tables,
but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
PUBLICATION; for the changes to become effective on the subscriber. I
have done some testing for this case:
1) created publication for table t1, see \d+ t1, the associated
publication is visible in the output
2) created subscription on the subscriber, initial available data from
the publisher for table t1 is received
3) insert into table t1 on the publisher
4) inserted data in (3) is received in the subscriber table t1
5) alter publication to drop the table t1 on the publisher, see \d+
t1, there will not be any associated publication in the output
6) execute alter subscription refresh publication on the subscriber,
with the expectation that it should not receive the data from the
publisher for the table t1 since it's dropped from the publication in
(5)
7) insert into table t1 on the publisher
8) still the newly inserted data in (7) from the publisher, will be
received into the table t1 in the subscriber

IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
the above use case, it looks like a bug to me. If I'm wrong, can
someone correct me?

Thoughts?

[1] - https://www.postgresql.org/docs/devel/sql-alterpublication.html

> > IMO, it's better to have the syntax - CREATE PUBLICATION
> > production_publication FOR ALL TABLES IN SCHEMA production - just
> > added IN between for all tables and schema.
> >
>
> I'm ok with the proposed syntax, I would like others' opinion too
> before making the change.

Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-01-11 06:22:05 Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Previous Message Masahiko Sawada 2021-01-11 05:39:36 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion