From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | tomas(dot)vondra(at)enterprisedb(dot)com |
Cc: | masao(dot)fujii(at)oss(dot)nttdata(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: sequences vs. synchronous replication |
Date: | 2021-12-24 08:04:51 |
Message-ID: | 20211224.170451.116141220257426861.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Fri, 24 Dec 2021 08:23:13 +0100, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote in
>
>
> On 12/24/21 06:37, Kyotaro Horiguchi wrote:
> > At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote in
> >> On 12/23/21 15:42, Fujii Masao wrote:
> >>> On 2021/12/23 3:49, Tomas Vondra wrote:
> >>>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
> >>>> the same thing as now, in higher levels we log every sequence fetch.
> >>> Thanks for the patch!
> >>> With the patch, I found that the regression test for sequences failed.
> >>> + fetch = log = fetch;
> >>> This should be "log = fetch"?
> >>> On second thought, originally a sequence doesn't guarantee that the
> >>> value already returned by nextval() will never be returned by
> >>> subsequent nextval() after the server crash recovery. That is,
> >>> nextval() may return the same value across crash recovery. Is this
> >>> understanding right? For example, this case can happen if the server
> >>> crashes after nextval() returned the value but before WAL for the
> >>> sequence was flushed to the permanent storage.
> >>
> >> I think the important step is commit. We don't guarantee anything for
> >> changes in uncommitted transactions. If you do nextval in a
> >> transaction and the server crashes before the WAL gets flushed before
> >> COMMIT, then yes, nextval may generate the same nextval again. But
> >> after commit that is not OK - it must not happen.
> > I don't mean to stand on Fujii-san's side particularly, but it seems
> > to me sequences of RDBSs are not rolled back generally. Some googling
> > told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
> > server doesn't rewind sequences at rollback, that is, sequences are
> > incremented independtly from transaction control. It seems common to
> > think that two nextval() calls for the same sequence must not return
> > the same value in any context.
> >
>
> Yes, sequences are not rolled back on abort generally. That would
> require much stricter locking, and that'd go against using sequences
> in concurrent sessions.
I thinks so.
> But we're not talking about sequence rollback - we're talking about
> data loss, caused by failure to flush WAL for a sequence. But that
> affects the *current* code too, and to much greater extent.
Ah, yes, I don't object to that aspect.
> Consider this:
>
> BEGIN;
> SELECT nextval('s') FROM generate_series(1,1000) s(i);
> ROLLBACK; -- or crash of a different backend
>
> BEGIN;
> SELECT nextval('s');
> COMMIT;
>
> With the current code, this may easily lose the WAL, and we'll
> generate duplicate values from the sequence. We pretty much ignore the
> COMMIT.
>
> With the proposed change to WAL logging, that is not possible. The
> COMMIT flushes enough WAL to prevent this issue.
>
> So this actually makes this issue less severe.
>
> Maybe I'm missing some important detail, though. Can you show an
> example where the proposed changes make the issue worse?
No. It seems to me improvoment at least from the current state, for
the reason you mentioned.
> >>> So it's not a bug that sync standby may return the same value as
> >>> already returned in the primary because the corresponding WAL has not
> >>> been replicated yet, isn't it?
> >>>
> >>
> >> No, I don't think so. Once the COMMIT happens (and gets confirmed by
> >> the sync standby), it should be possible to failover to the sync
> >> replica without losing any data in committed transaction. Generating
> >> duplicate values is a clear violation of that.
> > So, strictly speaking, that is a violation of the constraint I
> > mentioned regardless whether the transaction is committed or
> > not. However we have technical limitations as below.
> >
>
> I don't follow. What violates what?
>
> If the transaction commits (and gets a confirmation from sync
> replica), the modified WAL logging prevents duplicate values. It does
> nothing for uncommitted transactions. Seems like an improvement to me.
Sorry for the noise. I misunderstand that ROLLBACK is being changed to
rollback sequences.
> No idea. IMHO from the correctness / behavior point of view, the
> modified logging is an improvement. The only issue is the additional
> overhead, and I think the cache addresses that quite well.
Now I understand the story here.
I agree that the patch is improvment from the current behavior.
I agree that the overhead is eventually-nothing for WAL-emitting workloads.
Still, as Fujii-san concerns, I'm afraid that some people may suffer
the degradation the patch causes. I wonder it is acceptable to get
back the previous behavior by exposing SEQ_LOG_VALS itself or a
boolean to do that, as a 'not-recommended-to-use' variable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2021-12-24 08:06:57 | Re: more descriptive message for process termination due to max_slot_wal_keep_size |
Previous Message | Fujii Masao | 2021-12-24 08:01:55 | Re: Allow escape in application_name |