RE: Initial Schema Sync for Logical Replication

From: "Kumar, Sachin" <ssetiya(at)amazon(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Initial Schema Sync for Logical Replication
Date: 2023-07-07 09:16:01
Message-ID: a01ec64c94a5481d9e9508f95f18b709@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> So I've implemented a different approach; doing schema synchronization at a
> CREATE SUBSCRIPTION time. The backend executing CREATE SUBSCRIPTION
> uses pg_dump and restores the table schemas including both partitioned tables
> and their partitions regardless of publish_via_partition_root option, and then
> creates pg_subscription_rel entries for tables while respecting
> publish_via_partition_root option.
>
> There is a window between table creations and the tablesync workers starting to
> process the tables. If DDLs are executed in this window, the tablesync worker
> might fail because the table schema might have already been changed. We need
> to mention this note in the documentation. BTW, I think we will be able to get
> rid of this downside if we support DDL replication. DDLs executed in the window
> are applied by the apply worker and it takes over the data copy to the tablesync
> worker at a certain LSN.

I don’t think even with DDL replication we will be able to get rid of this window.
There are some issues
1. Even with tablesync worker taking over at certain LSN, publisher can make more changes till
Table sync acquires lock on publisher table via copy table.
2. how we will make sure that applier worker has caught up will all the changes from publisher
Before it starts tableSync worker. It can be lag behind publisher.

I think the easiest option would be to just recreate the table , this way we don’t have to worry about
complex race conditions, tablesync already makes a slot for copy data we can use same slot for
getting upto date table definition, dropping the table won't be much expensive since there won't be any data
in it.Apply worker will skip all the DDLs/DMLs till table is synced.

Although for partitioned tables we will be able to keep with published table schema changes only when
publish_by_partition_root is true.

Regards
Sachin
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-07-07 09:37:54 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Yugo NAGATA 2023-07-07 08:30:15 Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value