Re: Finalizing logical replication limitations as well as potential features

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Finalizing logical replication limitations as well as potential features
Date: 2018-01-04 20:25:38
Message-ID: 104a4c3a-6e4f-e76e-ab83-9d0399d5dfa6@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/21/2017 06:15 PM, Craig Ringer wrote:
> On 22 December 2017 at 05:24, Joshua D. Drake <jd(at)commandprompt(dot)com
> <mailto:jd(at)commandprompt(dot)com>> wrote:
>
> -Hackers,
>
>
> Lastly, I noted that a full sync of a replication set is performed
> by a COPY, this is fine for small sets but if we have a large data
> set that may take some time it may be a problem with overall
> performance and maintenance. We may want to see if we can do an
> initial sync incrementally (optional) via a cursor (?) and queue
> all changed rows until the sync completes?
>
>
> I'm not sure I understand this.
>
> The COPY is streamed from source to destination, IIRC it's not
> buffering to a tempfile or anything. So I fail to see what using a
> cursor would gain you. No matter whether you're using a cursor, a
> COPY, or something else, you have to hold down a specific xmin and
> work with the same snapshot for the whole sync operation. If you
> instead did something like incremental SELECTs, each with a new
> xmin+snapshot, across ranges of a PK your copy would see changes from
> different points in time depending on where in the copy it was up to,
> and you'd get an inconsistent view. It could possibly be worked around
> with some tricky key-range-based filtering of the applied
> change-stream if you were willing to require that no PK updates may
> occur, but it'd probably be bug city. It's hard enough to get sync
> correct at all.

I am not sure that this is entirely true. Granted it is easiest just to
do everything within a snapshot but we shouldn't have to. It would be
possible to perform incremental (even parallel) syncs whether copy or
other mechanism. We would have to track changes to the table as we sync
but that isn't impossible either (especially if we have a PK). I would
think that this would only be valid within async replication but it is
possible. We just queue/audit the changes as they happen and sync up the
changes after the initial sync completes. Multi-phase sync baby :D

Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
***** Unless otherwise stated, opinions are my own. *****

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-04 20:36:09 Re: GSoC 2018
Previous Message Tom Lane 2018-01-04 20:16:15 Re: pgsql: Add parallel-aware hash joins.