From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | "Kumar, Sachin" <ssetiya(at)amazon(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Initial Schema Sync for Logical Replication |
Date: | 2023-03-29 15:18:04 |
Message-ID: | CAD21AoANtgtqSavuhCn6Q3Qigogb05tKQ6mAQgVKhfZ0ysFrRw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 29, 2023 at 7:57 PM Kumar, Sachin <ssetiya(at)amazon(dot)com> wrote:
>
> > > > > From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > > > I think we won't be able to use same snapshot because the
> > > > > > transaction will be committed.
> > > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> > internally ?).
> > > > > > So somehow we need to keep this snapshot alive, even after
> > > > > > transaction is committed(or delay committing the transaction ,
> > > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > > can have a restart before tableSync is able to use the same
> > > > > > snapshot.)
> > > > > >
> > > > >
> > > > > Can we think of getting the table data as well along with schema
> > > > > via pg_dump? Won't then both schema and initial data will
> > > > > correspond to the same snapshot?
> > > >
> > > > Right , that will work, Thanks!
> > >
> > > While it works, we cannot get the initial data in parallel, no?
> > >
>
> I was thinking each TableSync process will call pg_dump --table, This way if we have N
> tableSync process, we can have N pg_dump --table=table_name called in parallel.
> In fact we can use --schema-only to get schema and then let COPY take care of data
> syncing . We will use same snapshot for pg_dump as well as COPY table.
How can we postpone creating the pg_subscription_rel entries until the
tablesync worker starts and does the schema sync? I think that since
pg_subscription_rel entry needs the table OID, we need either to do
the schema sync before creating the entry (i.e, during CREATE
SUBSCRIPTION) or to postpone creating entries as Amit proposed[1]. The
apply worker needs the information of tables to sync in order to
launch the tablesync workers, but it needs to create the table schema
to get that information.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Jelte Fennema | 2023-03-29 15:58:51 | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Previous Message | Jacob Champion | 2023-03-29 15:10:18 | Re: zstd compression for pg_dump |