Re: Support logical replication of DDLs

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>
Cc: "Zheng Li" <zhengli10(at)gmail(dot)com>, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, rajesh(dot)rs0541(at)gmail(dot)com, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2022-04-11 12:46:06
Message-ID: 45d0d97c-3322-4054-b94f-3c08774bbd90@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote:
> On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 23, 2022 at 10:39 AM Japin Li <japinli(at)hotmail(dot)com> wrote:
> >
> > 2. For DDL replication, do we need to wait for a consistent point of
> > snapshot? For DMLs, that point is a convenient point to initialize
> > replication from, which is why we export a snapshot at that point,
> > which is used to read normal data. Do we have any similar needs for
> > DDL replication?
> >
>
> I have thought a bit more about this and I think we need to build the
> snapshot for DML replication as we need to read catalog tables to
> decode the corresponding WAL but it is not clear to me if we have a
> similar requirement for DDL replication. If the catalog access is
> required then it makes sense to follow the current snapshot model,
> otherwise, we may need to think differently for DDL replication.
>
> One more related point is that for DML replication, we do ensure that
> we copy the entire data of the table (via initial sync) which exists
> even before the publication for that table exists, so do we want to do
> something similar for DDLs? How do we sync the schema of the table
> before the user has defined the publication? Say the table has been
> created before the publication is defined and after that, there are
> only Alter statements, so do we expect, users to create the table on
> the subscriber and then we can replicate the Alter statements? And
> even if we do that it won't be clear which Alter statements will be
> replicated after publication is defined especially if those Alters
> happened concurrently with defining publications?
The *initial* DDL replication is a different problem than DDL replication. The
former requires a snapshot to read the current catalog data and build a CREATE
command as part of the subscription process. The subsequent DDLs in that object
will be handled by a different approach that is being discussed here.

I'm planning to work on the initial DDL replication. I'll open a new thread as
soon as I write a design for it. Just as an example, the pglogical approach is
to use pg_dump behind the scenes to provide the schema [1]. It is a reasonable
approach but an optimal solution should be an API to provide the initial DDL
commands. I mean the main point of this feature is to have an API to create an
object that the logical replication can use it for initial schema
synchronization. This "DDL to create an object" was already discussed in the
past [2].

[1] https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_sync.c#L942
[2] https://www.postgresql.org/message-id/4E69156E.5060509%40dunslane.net

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pankaj Gupta 2022-04-11 13:00:57 DBlink extension and behavior of dblink_get_connections()
Previous Message Mats Taraldsvik 2022-04-11 12:07:34 Declarative partitioning and partition pruning/check

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-04-11 13:03:06 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Previous Message gkokolatos 2022-04-11 12:46:02 Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}