From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | tomas(dot)vondra(at)enterprisedb(dot)com |
Cc: | masao(dot)fujii(at)oss(dot)nttdata(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: sequences vs. synchronous replication |
Date: | 2021-12-24 05:37:25 |
Message-ID: | 20211224.143725.1750248643492340056.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote in
> On 12/23/21 15:42, Fujii Masao wrote:
> > On 2021/12/23 3:49, Tomas Vondra wrote:
> >> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
> >> the same thing as now, in higher levels we log every sequence fetch.
> > Thanks for the patch!
> > With the patch, I found that the regression test for sequences failed.
> > + fetch = log = fetch;
> > This should be "log = fetch"?
> > On second thought, originally a sequence doesn't guarantee that the
> > value already returned by nextval() will never be returned by
> > subsequent nextval() after the server crash recovery. That is,
> > nextval() may return the same value across crash recovery. Is this
> > understanding right? For example, this case can happen if the server
> > crashes after nextval() returned the value but before WAL for the
> > sequence was flushed to the permanent storage.
>
> I think the important step is commit. We don't guarantee anything for
> changes in uncommitted transactions. If you do nextval in a
> transaction and the server crashes before the WAL gets flushed before
> COMMIT, then yes, nextval may generate the same nextval again. But
> after commit that is not OK - it must not happen.
I don't mean to stand on Fujii-san's side particularly, but it seems
to me sequences of RDBSs are not rolled back generally. Some googling
told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
server doesn't rewind sequences at rollback, that is, sequences are
incremented independtly from transaction control. It seems common to
think that two nextval() calls for the same sequence must not return
the same value in any context.
> > So it's not a bug that sync standby may return the same value as
> > already returned in the primary because the corresponding WAL has not
> > been replicated yet, isn't it?
> >
>
> No, I don't think so. Once the COMMIT happens (and gets confirmed by
> the sync standby), it should be possible to failover to the sync
> replica without losing any data in committed transaction. Generating
> duplicate values is a clear violation of that.
So, strictly speaking, that is a violation of the constraint I
mentioned regardless whether the transaction is committed or
not. However we have technical limitations as below.
> IMHO the fact that we allow a transaction to commit (even just
> locally) without flushing all the WAL it depends on is clearly a data
> loss bug.
>
> > BTW, if the returned value is stored in database, the same value is
> > guaranteed not to be returned again after the server crash or by sync
> > standby. Because in that case the WAL of the transaction storing that
> > value is flushed and replicated.
> >
>
> True, assuming the table is WAL-logged etc. I agree the issue may be
> affecting a fairly small fraction of workloads, because most people
> use sequences to generate data for inserts etc.
It seems to me, from the fact that sequences are designed explicitly
untransactional and that behavior is widely adopted, the discussion
might be missing some significant use-cases. But there's a
possibility that the spec of sequence came from some technical
limitation in the past, but I'm not sure..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2021-12-24 05:51:34 | Re: Add checkpoint and redo LSN to LogCheckpointEnd log message |
Previous Message | Peter Smith | 2021-12-24 05:34:34 | Re: row filtering for logical replication |