Re: Logical Replication WIP

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2016-09-07 00:56:23
Message-ID: d11f2754-e760-c774-20b3-1b460e1efca1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Review of 0002-Add-SUBSCRIPTION-catalog-and-DDL.patch:

(As you had already mentioned, some of the review items in 0001 apply
analogously here.)

Changes needed to compile:

--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -218,7 +218,7 @@ CreateSubscription(CreateSubscriptionStmt *stmt)
CatalogUpdateIndexes(rel, tup);
heap_freetuple(tup);

- ObjectAddressSet(myself, SubscriptionRelationId, suboid);
+ ObjectAddressSet(myself, SubscriptionRelationId, subid);

heap_close(rel, RowExclusiveLock);

This is fixed later in patch 0005.

--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6140,8 +6140,7 @@ <title><structname>pg_subscription</structname>
Columns</title>
<entry><structfield>subpublications</structfield></entry>
<entry><type>text[]</type></entry>
<entry></entry>
- <entry>Array of subscribed publication names. For more on
publications
- see <xref linkend="publications">.
+ <entry>Array of subscribed publication names.
</entry>
</row>
</tbody>

I don't see that id defined in any later patch.

Minor problems:

Probably unintentional change in pg_dump.h:

- * The PublicationInfo struct is used to represent publications.
+ * The PublicationInfo struct is used to represent publication.

pg_subscription column "dbid" rename to "subdbid".

I think subpublications ought to be of type name[], not text[].

It says, a subscription can only be dropped by its owner or a
superuser. But subscriptions don't have owners. Maybe they should.

On the CREATE SUBSCRIPTION ref page,

| INITIALLY ( ENABLED | DISABLED )

should use {} instead.

We might want to add ALTER commands to rename subscriptions and
publications.

Similar concerns as before about ALTER syntax, e.g., does ALTER
SUBSCRIPTION ... PUBLICATION add to or replace the publication set?

For that matter, why is there no way to add?

Document why publicationListToArray() creates its own memory context.

I think we should allow creating subscriptions initally without
publications. This could be useful for example to test connections,
or create slots before later on adding publications. Seeing that
there is support for changing the publications later, this shouldn't
be a problem.

The synopsis of CREATE SUBSCRIPTION indicates that options are
optional, but it actually requires at least one option.

At the end of CreateSubscription(), the CommandCounterIncrement()
doesn't appear to be necessary (yet, see patch 0005?).

Maybe check for duplicates in the publications list.

Larger conceptual issues:

I haven't read the rest of the code yet to understand why
pg_subscription needs to be a shared catalog, but be that as it may, I
would still make it so that subscription names appear local to the
database. We already have the database OID in the pg_subscription
catalog, so I would make the key (subname, subdatid). DDL commands
would only operate on subscriptions in the current database (or you
could use "datname"."subname" syntax), and \drs would only show
subscriptions in the current database. That way the shared catalog is
an implementation detail that can be changed in the future. I think
it would be very helpful for users if publications and subscriptions
appear to work in a parallel way. If I have two databases that I want
to replicate between two servers, I might want to have a publication
"mypub" in each database and a subscription "mysub" in each database.
If I learn that the subscriptions can't be named that way, then I have
to go back to rename to the publications, and it'll all be a bit
frustrating.

Some thoughts on pg_dump and such:

Even an INITIALLY DISABLED subscription needs network access to create
the replication slot. So restoring a dump when the master is not
available will have some difficulties. And restoring master and slave
at the same time (say disaster recovery) will not necessarily work
well either. Also, the general idea of doing network access during a
backup restore without obvious prior warning sounds a bit unsafe.

I imagine maybe having three states for subscriptions: DISABLED,
PREPARED, ENABLED (to use existing key words). DISABLED just exists
in the catalog, PREPARED has the slots set up, ENABLED starts
replicating. So you can restore a dump with all slots disabled. And
then it might be good to have a command to "prepare" or "enable" all
subscriptions at once.

That command would also help if you restore a dump not in a
transaction but you want to enable all subscriptions in the same
transaction.

I'd also prefer having subscriptions dumped by default, just to keep
it so that pg_dump by default backs up everything.

Finally, having disabled subscriptions without network access would
also allow writing some DDL command tests.

As I had mentioned privately before, I would perhaps have CREATE
SUBSCRIPTION use the foreign server infrastructure for storing
connection information.

We'll have to keep thinking about ways to handle abandonded
replication slots. I imagine that people will want to create
subscriber instances in fully automated ways. If that fails every so
often and requires manual cleanup of replication slots on the master
some of the time, that will get messy. I don't have well-formed ideas
about this, though.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-09-07 01:30:59 Re: ICU integration
Previous Message Peter Geoghegan 2016-09-07 00:52:27 Re: Parallel tuplesort (for parallel B-Tree index creation)