From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com> |
Subject: | Re: logical decoding and replication of sequences, take 2 |
Date: | 2023-06-13 17:31:08 |
Message-ID: | 637c4c06-2ed4-2a37-11f5-9eb9c2d43f36@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5/18/23 16:23, Ashutosh Bapat wrote:
> Hi,
> Sorry for jumping late in this thread.
>
> I started experimenting with the functionality. Maybe something that
> was already discussed earlier. Given that the thread is being
> discussed for so long and has gone several changes, revalidating the
> functionality is useful.
>
> I considered following aspects:
> Changes to the sequence on subscriber
> -----------------------------------------------------
> 1. Since this is logical decoding, logical replica is writable. So the
> logically replicated sequence can be manipulated on the subscriber as
> well. This implementation consolidates the changes on subscriber and
> publisher rather than replicating the publisher state as is. That's
> good. See example command sequence below
> a. publisher calls nextval() - this sets the sequence state on
> publisher as (1, 32, t) which is replicated to the subscriber.
> b. subscriber calls nextval() once - this sets the sequence state on
> subscriber as (34, 32, t)
> c. subscriber calls nextval() 32 times - on-disk state of sequence
> doesn't change on subscriber
> d. subscriber calls nextval() 33 times - this sets the sequence state
> on subscriber as (99, 0, t)
> e. publisher calls nextval() 32 times - this sets the sequence state
> on publisher as (33, 0, t)
>
> The on-disk state on publisher at the end of e. is replicated to the
> subscriber but subscriber doesn't apply it. The state there is still
> (99, 0, t). I think this is closer to how logical replication of
> sequence should look like. This is aso good enough as long as we
> expect the replication of sequences to be used for failover and
> switchover.
>
I'm really confused - are you describing what the patch is doing, or
what you think it should be doing? Because right now there's nothing
that'd "consolidate" the changes (in the sense of reconciling write
conflicts), and there's absolutely no way to do that.
So if the subscriber advances the sequence (which it technically can),
the subscriber state will be eventually be discarded and overwritten
when the next increment gets decoded from WAL on the publisher.
There's no way to fix this with type of sequences - it requires some
sort of global consensus (consensus on range assignment, locking or
whatever), which we don't have.
If the sequence is the only thing replicated, this may go unnoticed. But
chances are the user is also replicating the table with PK populated by
the sequence, at which point it'll lead to constraint violation.
> But it might not help if we want to consolidate the INSERTs that use
> nextvals(). If we were to treat sequences as accumulating the
> increments, we might be able to resolve the conflicts by adjusting the
> columns values considering the increments made on subscriber. IIUC,
> conflict resolution is not part of built-in logical replication. So we
> may not want to go this route. But worth considering.
We can't just adjust values in columns that may be used externally.
>
> Implementation agnostic decoded change
> --------------------------------------------------------
> Current method of decoding and replicating the sequences is tied to
> the implementation - it replicates the sequence row as is. If the
> implementation changes in future, we might need to revise the decoded
> presentation of sequence. I think only nextval() matters for sequence.
> So as long as we are replicating information enough to calculate the
> nextval we should be good. Current implementation does that by
> replicating the log_value and is_called. is_called can be consolidated
> into log_value itself. The implemented protocol, thus requires two
> extra values to be replicated. Those can be ignored right now. But
> they might pose a problem in future, if some downstream starts using
> them. We will be forced to provide fake but sane values even if a
> future upstream implementation does not produce those values. Of
> course we can't predict the future implementation enough to decide
> what would be an implementation independent format. E.g. if a
> pluggable storage were to be used to implement sequences or if we come
> around implementing distributed sequences, their shape can't be
> predicted right now. So a change in protocol seems to be unavoidable
> whatever we do. But starting with bare minimum might save us from
> larger troubles. I think, it's better to just replicate the nextval()
> and craft the representation on subscriber so that it produces that
> nextval().
Yes, I agree with this. It's probably better to replicate just the next
value, without the log_cnt / is_called fields (which are implementation
specific).
>
> 3. Primary key sequences
> -----------------------------------
> I am not experimented with this. But I think we will need to add the
> sequences associated with the primary keys to the publications
> publishing the owner tables. Otherwise, we will have problems with the
> failover. And it needs to be done automatically since a. the names of
> these sequences are generated automatically b. publications with FOR
> ALL TABLES will add tables automatically and start replicating the
> changes. Users may not be able to intercept the replication activity
> to add the associated sequences are also addedto the publication.
>
Right, this idea was mentioned before, and I agree maybe we should
consider adding some of those "automatic" sequences automatically.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tristan Partin | 2023-06-13 17:47:33 | Re: Use COPY for populating all pgbench tables |
Previous Message | Andres Freund | 2023-06-13 16:58:54 | Add wait event for log emission? |