From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org> |
Subject: | Re: Logical Replication of sequences |
Date: | 2024-06-18 01:59:29 |
Message-ID: | CAD21AoCX07dOB6t78cuoU4fioLkPOBgGFPVwLPa8=DvRLF3Y6w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 14, 2024 at 4:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jun 13, 2024 at 6:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jun 13, 2024 at 7:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Jun 13, 2024 at 1:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Jun 12, 2024 at 6:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > Yeah, starting with a single worker sounds good for now. Do you think
> > > > > we should sync all the sequences in a single transaction or have some
> > > > > threshold value above which a different transaction would be required
> > > > > or maybe a different sequence sync worker altogether? Now, having
> > > > > multiple sequence-sync workers requires some synchronization so that
> > > > > only a single worker is allocated for one sequence.
> > > > >
> > > > > The simplest thing is to use a single sequence sync worker that syncs
> > > > > all sequences in one transaction but with a large number of sequences,
> > > > > it could be inefficient. OTOH, I am not sure if it would be a problem
> > > > > in reality.
> > > >
> > > > I think that we can start with using a single worker and one
> > > > transaction, and measure the performance with a large number of
> > > > sequences.
> > > >
> > >
> > > Fair enough. However, this raises the question Dilip and Vignesh are
> > > discussing whether we need a new relfilenode for sequence update even
> > > during initial sync? As per my understanding, the idea is that similar
> > > to tables, the CREATE SUBSCRIPTION command (with copy_data = true)
> > > will create the new sequence entries in pg_subscription_rel with the
> > > state as 'i'. Then the sequence-sync worker would start a transaction
> > > and one-by-one copy the latest sequence values for each sequence (that
> > > has state as 'i' in pg_subscription_rel) and mark its state as ready
> > > 'r' and commit the transaction. Now if there is an error during this
> > > operation it will restart the entire operation. The idea of creating a
> > > new relfilenode is to handle the error so that if there is a rollback,
> > > the sequence state will be rolled back to 'i' and the sequence value
> > > will also be rolled back. The other option could be that we update the
> > > sequence value without a new relfilenode and if the transaction rolled
> > > back then only the sequence's state will be rolled back to 'i'. This
> > > would work with a minor inconsistency that sequence values will be
> > > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > > I am not sure if that matters because anyway, they can quickly be
> > > out-of-sync with the publisher again.
> >
> > I think it would be fine in many cases even if the sequence value is
> > up-to-date even when the sequence state is 'i' in pg_subscription_rel.
> > But the case we would like to avoid is where suppose the sequence-sync
> > worker does both synchronizing sequence values and updating the
> > sequence states for all sequences in one transaction, and if there is
> > an error we end up retrying the synchronization for all sequences.
> >
>
> The one idea to avoid this is to update sequences in chunks (say 100
> or some threshold number of sequences in one transaction). Then we
> would only redo the sync for the last and pending set of sequences.
That could be one idea.
>
> > >
> > > Now, say we don't want to maintain the state of sequences for initial
> > > sync at all then after the error how will we detect if there are any
> > > pending sequences to be synced? One possibility is that we maintain a
> > > subscription level flag 'subsequencesync' in 'pg_subscription' to
> > > indicate whether sequences need sync. This flag would indicate whether
> > > to sync all the sequences in pg_susbcription_rel. This would mean that
> > > if there is an error while syncing the sequences we will resync all
> > > the sequences again. This could be acceptable considering the chances
> > > of error during sequence sync are low. The benefit is that both the
> > > REFRESH PUBLICATION SEQUENCES and CREATE SUBSCRIPTION can use the same
> > > idea and sync all sequences without needing a new relfilenode. Users
> > > can always refer 'subsequencesync' flag in 'pg_subscription' to see if
> > > all the sequences are synced after executing the command.
> >
> > I think that REFRESH PUBLICATION {SEQUENCES} can be executed even
> > while the sequence-sync worker is synchronizing sequences. In this
> > case, the worker might not see new sequences added by the concurrent
> > REFRESH PUBLICATION {SEQUENCES} command since it's already running.
> > The worker could end up marking the subsequencesync as completed while
> > not synchronizing these new sequences.
> >
>
> This is possible but we could avoid REFRESH PUBLICATION {SEQUENCES} by
> not allowing to change the subsequencestate during the time
> sequence-worker is syncing the sequences. This could be restrictive
> but there doesn't seem to be cases where user would like to
> immediately refresh sequences after creating the subscription.
I'm concerned that users would not be able to add sequences during the
time the sequence-worker is syncing the sequences. For example,
suppose we have 10000 sequences and execute REFRESH PUBLICATION
{SEQUENCES} to synchronize 10000 sequences. Now if we add one sequence
to the publication and want to synchronize it to the subscriber, we
have to wait for the current REFRESH PUBLICATION {SEQUENCES} to
complete, and then execute it again, synchronizing 10001 sequences,
instead of synchronizing only the new one.
>
> > >
> > > > > > Or yet another idea I came up with is that a tablesync worker will
> > > > > > synchronize both the table and sequences owned by the table. That is,
> > > > > > after the tablesync worker caught up with the apply worker, the
> > > > > > tablesync worker synchronizes sequences associated with the target
> > > > > > table as well. One benefit would be that at the time of initial table
> > > > > > sync being completed, the table and its sequence data are consistent.
> > > >
> > > > Correction; it's not guaranteed that the sequence data and table data
> > > > are consistent even in this case since the tablesync worker could get
> > > > on-disk sequence data that might have already been updated.
> > > >
> > >
> > > The benefit of this approach is not clear to me. Our aim is to sync
> > > all sequences before the upgrade, so not sure if this helps because
> > > anyway both table values and corresponding sequences can again be
> > > out-of-sync very quickly.
> >
> > Right.
> >
> > Given that our aim is to sync all sequences before the upgrade, do we
> > need to synchronize sequences even at CREATE SUBSCRIPTION time? In
> > cases where there are a large number of sequences, synchronizing
> > sequences in addition to tables could be overhead and make less sense,
> > because sequences can again be out-of-sync quickly and typically
> > CREATE SUBSCRIPTION is not created just before the upgrade.
> >
>
> I think for the upgrade one should be creating a subscription just
> before the upgrade. Isn't something similar is done even in the
> upgrade steps you shared once [1]?
I might be missing something but in the blog post they created
subscriptions in various ways, waited for the initial table data sync
to complete, and then set the sequence values with a buffer based on
the old cluster. What I imagined with this sequence synchronization
feature is that after the initial table sync completes, we stop to
execute further transactions on the publisher, synchronize sequences
using REFRESH PUBLICATION {SEQUENCES}, and resume the application to
execute transactions on the subscriber. So a subscription would be
created just before the upgrade, but sequence synchronization would
not necessarily happen at the same time of the initial table data
synchronization.
> Typically users should get all the
> data from the publisher before the upgrade of the publisher via
> creating a subscription. Also, it would be better to keep the
> implementation of sequences close to tables wherever possible. Having
> said that, I understand your point as well and if you strongly feel
> that we don't need to sync sequences at the time of CREATE
> SUBSCRIPTION and others also don't see any problem with it then we can
> consider that as well.
I see your point that it's better to keep the implementation of
sequences close to the table one. So I agree that we can start with
this approach, and we will see how it works in practice and consider
other options later.
>
> > > >
> > >
> > > Do you mean that sync the sequences during the REFRESH PUBLICATION
> > > SEQUENCES command itself? If so, there is an argument that we can do
> > > the same during CREATE SUBSCRIPTION. It would be beneficial to keep
> > > the method to sync the sequences same for both the CREATE and REFRESH
> > > commands. I have speculated on one idea above and would be happy to
> > > see your thoughts.
> >
> > I meant that the REFRESH PUBLICATION SEQUENCES command updates all
> > sequence states in pg_subscription_rel to 'init' state, and the
> > sequence-sync worker can do the synchronization work. We use the same
> > method for both the CREATE SUBSCRIPTION and REFRESH PUBLICATION
> > {SEQUENCES} commands.
> >
>
> Marking the state as 'init' when we would have already synced the
> sequences sounds a bit odd but otherwise, this could also work if we
> accept that even if the sequences are synced and value could remain in
> 'init' state (on rollbacks).
I mean that it's just for identifying sequences that need to be
synced. With the idea of using sequence states in pg_subscription_rel,
the REFRESH PUBLICATION SEQUENCES command needs to change states to
something so that the sequence-sync worker can identify which sequence
needs to be synced. If 'init' sounds odd, we can invent a new state
for sequences, say 'needs-to-be-syned'.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2024-06-18 02:14:16 | RE: Conflict Detection and Resolution |
Previous Message | Andres Freund | 2024-06-18 01:57:30 | Re: Inval reliability, especially for inplace updates |