From: | Peter Smith <smithpb2250(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | Re: Single transaction in the tablesync worker? |
Date: | 2020-12-10 09:49:13 |
Message-ID: | CAHut+PvWpY0bN7XofG+XRHejn-Ya4SSgdZG1ysv2oe1iY7LeOw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 8, 2020 at 9:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 8, 2020 at 11:53 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Yes, I observed this same behavior.
> >
> > IIUC the only way for the tablesync worker to go from CATCHUP mode to
> > SYNCDONE is via the call to process_sync_tables.
> >
> > But a side-effect of this is, when messages arrive during this CATCHUP
> > phase one tx will be getting handled by the tablesync worker before
> > the process_sync_tables() is ever encountered.
> >
> > I have created and attached a simple patch which allows the tablesync
> > to detect if there is anything to do *before* it enters the apply main
> > loop. Calling process_sync_tables() before the apply main loop offers
> > a quick way out so the message handling will not be split
> > unnecessarily between the workers.
> >
>
> Yeah, this demonstrates the idea can work but as mentioned in my
> previous email [1] this needs much more work to make the COPY and then
> later fetching the changes from the publisher consistently. So, let me
> summarize the discussion so far. We wanted to enhance the tablesync
> phase of Logical Replication to enable decoding of prepared
> transactions [2]. The problem was when we stream prepared transactions
> in the tablesync worker, it will simply commit the same due to the
> requirement of maintaining a single transaction for the entire
> duration of copy and streaming of transactions afterward. We can't
> simply disable the decoding of prepared xacts for tablesync workers
> because it can skip some of the prepared xacts forever on subscriber
> as explained in one of the emails above [3]. Now, while investigating
> the solutions to enhance tablesync to support decoding at prepare
> time, I found that due to the current design of tablesync we can see
> partial data of transactions on subscribers which is also explained in
> the email above with an example [4]. This problem of visibility is
> there since the Logical Replication is introduced in PostgreSQL and
> the only answer I got till now is that there doesn't seem to be any
> other alternative which I think is not true and I have provided one
> alternative as well.
>
> Next, we have discussed three different solutions all of which will
> solve the first problem (allow the tablesync worker to decode
> transactions at prepare time) and one of which solves both the first
> and second problem (partial transaction data visibility).
>
> Solution-1: Allow the table-sync worker to use multiple transactions.
> The reason for doing it in a single transaction is that if after
> initial COPY we commit and then crash while streaming changes of other
> transactions, the state of the table won't be known after the restart
> as we are using temporary slot so we don't from where to restart
> syncing the table.
>
> IIUC, we need to primarily do two things to achieve multiple
> transactions, one is to have an additional state in the catalog (say
> catch up) which will say that the initial copy is done. Then we need
> to have a permanent slot using which we can track the progress of the
> slot so that after restart (due to crash, connection break, etc.) we
> can start from the appropriate position. Now, this will allow us to do
> less work after recovering from a crash because we will know the
> restart point. As Craig mentioned, it also allows the sync slot to
> advance, freeing any held upstream resources before the whole sync is
> done, which is good if the upstream is busy and generating lots of
> WAL. Finally, committing as we go means we won't exceed the cid
> increment limit in a single txn.
>
> Solution-2: The next solution we discussed is to make "tablesync"
> worker just gather messages after COPY in a similar way to how the
> current streaming of in-progress transaction feature gathers messages
> into a "changes" file so that they can be replayed later by the apply
> worker. Now, here as we don't need to replay the individual
> transactions in tablesync worker in a single transaction, it will
> allow us to send decode prepared to the subscriber. This has some
> disadvantages such as each transaction processed by tablesync worker
> needs to be durably written to file and it can also lead to some apply
> lag later when we process the same by apply worker.
>
> Solution-3: Allow the table-sync workers to just perform initial COPY
> and then once the COPY is done for all relations the apply worker will
> stream all the future changes. Now, surely if large copies are
> required for multiple relations then we would delay a bit to replay
> transactions partially by the apply worker but don't know how much
> that matters as compared to transaction visibility issue and anyway we
> would have achieved the maximum parallelism by allowing copy via
> multiple workers. This would reduce the load (both CPU and I/O) on the
> publisher-side by allowing to decode the data only once instead of for
> each table sync worker once and separately for the apply worker. I
> think it will use fewer resources to finish the work.
>
> Currently, in tablesync worker, we create a slot with CRS_USE_SNAPSHOT
> option which creates a transaction snapshot on the publisher, and then
> we use the same snapshot for COPY from the publisher. After this, when
> we try to receive the data from the publisher using the same slot, it
> will be in sync with the COPY. I think to keep the same consistency
> between COPY and the data we receive from the publisher in this
> approach, we need to export the snapshot while creating a slot in the
> apply worker by using CRS_EXPORT_SNAPSHOT and then use the same
> snapshot by all the tablesync workers doing the copy. In tablesync
> workers, we can use the SET TRANSACTION SNAPSHOT command after "BEGIN
> READ ONLY ISOLATION LEVEL REPEATABLE READ" to use the exported
> snapshot. That way the COPY will use the same snapshot as is used for
> receiving the changes in apply worker and the data will be in sync.
>
> Then we also need a way to export snapshot while the apply worker is
> already receiving the changes because users can use 'ALTER
> SUBSCRIPTION name REFRESH PUBLICATION' which allows new tables to be
> synced. I think we need to introduce a new command in
> exec_replication_command() to export the snapshot from the existing
> slot and then use it by the new tablesync worker.
>
>
> Among the above three solutions, the first two will solve the first
> problem (allow the tablesync worker to decode transactions at prepare
> time) and the third solution will solve both the first and second
> problem (partial transaction data visibility). The third solution
> requires quite some redesign of how the Logical Replication work is
> synchronized between apply and tablesync workers and might turn out to
> be a bigger implementation effort. I am tentatively thinking to go
> with a first or second solution at this stage and anyway if later
> people feel that we need some bigger redesign then we can go with
> something on the lines of Solution-3.
>
> Thoughts?
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BQC74wRQmbYT%2BMmOs%3DYbdUjuq0_A9CBbVoQMB1Ryi-OA%40mail.gmail.com
> [2] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com
> [3] - https://www.postgresql.org/message-id/CAA4eK1KFsjf6x-S7b0dJLvEL3tcn9x-voBJiFoGsccyH5xgDzQ%40mail.gmail.com
> [4] - https://www.postgresql.org/message-id/CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg%40mail.gmail.com
>
> --
Hi Amit,
- Solution-3 has become too complicated to be attempted by me. Anyway,
we may be better to just focus on eliminating the new problems exposed
by the 2PC work [1], rather than burning too much effort to fix some
other quirk which apparently has existed for years.
[1] https://www.postgresql.org/message-id/CAHut%2BPtm7E5Jj92tJWPtnnjbNjJN60_%3DaGGKYW3h23b7J%3DqeDg%40mail.gmail.com
- Solution-2 has some potential lag problems, and maybe file resource
problems as well. This idea did not get a very favourable response
when I first proposed it.
- This leaves Solution-1 as the best viable option to fix the current
known 2PC trouble.
~~
So I will try to write a patch for the proposed Solution-1.
---
Kind Regards,
Peter Smith.
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2020-12-10 09:56:37 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | Krunal Bauskar | 2020-12-10 09:18:12 | Re: Improving spin-lock implementation on ARM. |