From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | masao(dot)fujii(at)oss(dot)nttdata(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: sequences vs. synchronous replication |
Date: | 2021-12-24 07:23:13 |
Message-ID: | ad83e515-44f6-4691-8fe0-503b94499ed5@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/24/21 06:37, Kyotaro Horiguchi wrote:
> At Thu, 23 Dec 2021 19:50:22 +0100, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote in
>> On 12/23/21 15:42, Fujii Masao wrote:
>>> On 2021/12/23 3:49, Tomas Vondra wrote:
>>>> Attached is a patch tweaking WAL logging - in wal_level=minimal we do
>>>> the same thing as now, in higher levels we log every sequence fetch.
>>> Thanks for the patch!
>>> With the patch, I found that the regression test for sequences failed.
>>> + fetch = log = fetch;
>>> This should be "log = fetch"?
>>> On second thought, originally a sequence doesn't guarantee that the
>>> value already returned by nextval() will never be returned by
>>> subsequent nextval() after the server crash recovery. That is,
>>> nextval() may return the same value across crash recovery. Is this
>>> understanding right? For example, this case can happen if the server
>>> crashes after nextval() returned the value but before WAL for the
>>> sequence was flushed to the permanent storage.
>>
>> I think the important step is commit. We don't guarantee anything for
>> changes in uncommitted transactions. If you do nextval in a
>> transaction and the server crashes before the WAL gets flushed before
>> COMMIT, then yes, nextval may generate the same nextval again. But
>> after commit that is not OK - it must not happen.
>
> I don't mean to stand on Fujii-san's side particularly, but it seems
> to me sequences of RDBSs are not rolled back generally. Some googling
> told me that at least Oracle (documented), MySQL, DB2 and MS-SQL
> server doesn't rewind sequences at rollback, that is, sequences are
> incremented independtly from transaction control. It seems common to
> think that two nextval() calls for the same sequence must not return
> the same value in any context.
>
Yes, sequences are not rolled back on abort generally. That would
require much stricter locking, and that'd go against using sequences in
concurrent sessions.
But we're not talking about sequence rollback - we're talking about data
loss, caused by failure to flush WAL for a sequence. But that affects
the *current* code too, and to much greater extent.
Consider this:
BEGIN;
SELECT nextval('s') FROM generate_series(1,1000) s(i);
ROLLBACK; -- or crash of a different backend
BEGIN;
SELECT nextval('s');
COMMIT;
With the current code, this may easily lose the WAL, and we'll generate
duplicate values from the sequence. We pretty much ignore the COMMIT.
With the proposed change to WAL logging, that is not possible. The
COMMIT flushes enough WAL to prevent this issue.
So this actually makes this issue less severe.
Maybe I'm missing some important detail, though. Can you show an example
where the proposed changes make the issue worse?
>>> So it's not a bug that sync standby may return the same value as
>>> already returned in the primary because the corresponding WAL has not
>>> been replicated yet, isn't it?
>>>
>>
>> No, I don't think so. Once the COMMIT happens (and gets confirmed by
>> the sync standby), it should be possible to failover to the sync
>> replica without losing any data in committed transaction. Generating
>> duplicate values is a clear violation of that.
>
> So, strictly speaking, that is a violation of the constraint I
> mentioned regardless whether the transaction is committed or
> not. However we have technical limitations as below.
>
I don't follow. What violates what?
If the transaction commits (and gets a confirmation from sync replica),
the modified WAL logging prevents duplicate values. It does nothing for
uncommitted transactions. Seems like an improvement to me.
>> IMHO the fact that we allow a transaction to commit (even just
>> locally) without flushing all the WAL it depends on is clearly a data
>> loss bug.
>>
>>> BTW, if the returned value is stored in database, the same value is
>>> guaranteed not to be returned again after the server crash or by sync
>>> standby. Because in that case the WAL of the transaction storing that
>>> value is flushed and replicated.
>>>
>>
>> True, assuming the table is WAL-logged etc. I agree the issue may be
>> affecting a fairly small fraction of workloads, because most people
>> use sequences to generate data for inserts etc.
>
> It seems to me, from the fact that sequences are designed explicitly
> untransactional and that behavior is widely adopted, the discussion
> might be missing some significant use-cases. But there's a
> possibility that the spec of sequence came from some technical
> limitation in the past, but I'm not sure..
>
No idea. IMHO from the correctness / behavior point of view, the
modified logging is an improvement. The only issue is the additional
overhead, and I think the cache addresses that quite well.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2021-12-24 08:01:55 | Re: Allow escape in application_name |
Previous Message | Kyotaro Horiguchi | 2021-12-24 06:32:10 | Re: An obsolete comment of pg_stat_statements |