Re: Support logical replication of DDLs

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, japin <japinli(at)hotmail(dot)com>, Zheng Li <zhengli10(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, rajesh(dot)rs0541(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Support logical replication of DDLs
Date: 2022-05-10 07:02:05
Message-ID: CAD21AoDThyJBFfjjLVFnfMR5L3BfPTWYrotaSJq-=OE3P0GVeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Apr 13, 2022 at 6:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Apr 13, 2022 at 2:38 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 12, 2022 at 4:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > The *initial* DDL replication is a different problem than DDL replication. The
> > > > former requires a snapshot to read the current catalog data and build a CREATE
> > > > command as part of the subscription process. The subsequent DDLs in that object
> > > > will be handled by a different approach that is being discussed here.
> > > >
> > >
> > > I think they are not completely independent because of the current way
> > > to do initial sync followed by replication. The initial sync and
> > > replication need some mechanism to ensure that one of those doesn't
> > > overwrite the work done by the other. Now, the initial idea and patch
> > > can be developed separately but I think both the patches have some
> > > dependency.
> >
> > I agree with the point that their design can not be completely
> > independent. They have some logical relationship of what schema will
> > be copied by the initial sync and where is the exact boundary from
> > which we will start sending as replication. And suppose first we only
> > plan to implement the replication part then how the user will know
> > what all schema user has to create and what will be replicated using
> > DDL replication? Suppose the user takes a dump and copies all the
> > schema and then creates the subscription, then how we are we going to
> > handle the DDL concurrent to the subscription command?
> >
>
> Right, I also don't see how it can be done in the current
> implementation. So, I think even if we want to develop these two as
> separate patches they need to be integrated to make the solution
> complete.

It would be better to develop them separately in terms of development
speed but, yes, we perhaps need to integrate them at some points.

I think that the initial DDL replication can be done when the
relation's state is SUBREL_STATE_INIT. That is, at the very beginning
of the table synchronization, the syncworker copies the table schema
somehow, then starts the initial data copy. After that, syncworker or
applyworker applies DML/DDL changes while catching up and streaming
changes, respectively. Probably we can have it optional whether to
copy schema only, data only, or both.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Amit Kapila 2022-05-10 09:27:28 Re: Support logical replication of DDLs
Previous Message Tom Lane 2022-05-10 06:03:59 Re: Question on cast string to date

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-05-10 07:09:47 Re: Mark all GUC variable as PGDLLIMPORT
Previous Message Masahiko Sawada 2022-05-10 05:09:14 Re: Perform streaming logical transactions by background workers and parallel apply