Re: logical decoding and replication of sequences, take 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2024-02-13 16:37:11
Message-ID: CA+TgmoaSCu=bLtU7Zyth=h2Lgtzxv=xfRwLYt2ZtSKA=UN2SnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 28, 2024 at 1:07 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> Right, locks + apply in commit order gives us this guarantee (I can't
> think of a case where it wouldn't be the case).

I couldn't find any cases of inadequate locking other than the one I mentioned.

> Doesn't the whole logical replication critically depend on the commit
> order? If you decide to arbitrarily reorder/delay the transactions, all
> kinds of really bad things can happen. That's a generic problem, it
> applies to all kinds of objects, not just sequences - a parallel apply
> would need to detect this sort of dependencies (e.g. INSERT + DELETE of
> the same key), and do something about it.

Yes, but here I'm not just talking about the commit order. I'm talking
about the order of applying non-transactional operations relative to
commits.

Consider:

T1: CREATE SEQUENCE s;
T2: BEGIN;
T2: SELECT nextval('s');
T3: SELECT nextval('s');
T2: ALTER SEQUENCE s INCREMENT 2;
T2: SELECT nextval('s');
T2: COMMIT;

The commit order is T1 < T3 < T2, but T3 makes no transactional
changes, so the commit order is really just T1 < T2. But it's
completely wrong to say that all we need to do is apply T1 before we
apply T2. The correct order of application is:

1. T1.
2. T2's first nextval
3. T3's nextval
4. T2's transactional changes (i.e. the ALTER SEQUENCE INCREMENT and
the subsequent nextval)

In other words, the fact that some sequence changes are
non-transactional creates ordering hazards that don't exist if there
are no non-transactional changes. So in that way, sequences are
different from table modifications, where applying the transactions in
order of commit is all we need to do. Here we need to apply the
transactions in order of commit and also apply the non-transactional
changes at the right point in the sequence. Consider the following
alternative apply sequence:

1. T1.
2. T2's transactional changes (i.e. the ALTER SEQUENCE INCREMENT and
the subsequent nextval)
3. T3's nextval
4. T2's first nextval

That's still in commit order. It's also wrong.

Imagine that you commit this patch and someone later wants to do
parallel logical apply. So every time they finish decoding a
transaction, they stick it in a queue to be applied by the next
available worker. But, non-transactional changes are very simple, so
we just directly apply those in the main process. Well, kaboom! But
now this can happen with the above example.

1. Decode T1. Add to queue for apply.
2. Before the (idle) apply worker has a chance to pull T1 out of the
queue, decode the first nextval and try to apply it.

Oops. We're trying to apply a modification to a sequence that hasn't
been created yet. I'm not saying that this kind of hypothetical is a
reason not to commit the patch. But it seems like we're not on the
same page about what the ordering requirements are here. I'm just
making the argument that those non-transactional operations actually
act like mini-transactions. They need to happen at the right time
relative to the real transactions. A non-transactional operation needs
to be applied after any transactions that commit before it is logged,
and before any transactions that commit after it's logged.

> Yes, I think this is a bug in handling of owned sequences - from the
> moment the "ALTER TABLE ... SET UNLOGGED" is executed, the two sessions
> generate duplicate values (until the S1 is committed, at which point the
> values generated in S2 get "forgotten").
>
> It seems we end up updating both relfilenodes, which is clearly wrong.
>
> Seems like a bug independent of the decoding, IMO.

Yeah.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-02-13 16:51:20 Re: Document efficient self-joins / UPDATE LIMIT techniques.
Previous Message Bertrand Drouvot 2024-02-13 15:55:07 Re: Synchronizing slots from primary to standby