From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | "matsumura(dot)ryo(at)fujitsu(dot)com" <matsumura(dot)ryo(at)fujitsu(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "bossartn(at)amazon(dot)com" <bossartn(at)amazon(dot)com>, "masao(dot)fujii(at)gmail(dot)com" <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: archive status ".ready" files may be created too early |
Date: | 2020-10-12 12:04:40 |
Message-ID: | 65e309c0-5d51-7f9e-80ed-947414db8280@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/07/2020 12:02, matsumura(dot)ryo(at)fujitsu(dot)com wrote:
> At Monday, July 6, 2020 05:13:40 +0000, "Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>" wrote in
>>>> after WAL buffer is filled up to the requested position. So when it
>>>> crosses segment boundary we know the all past corss segment-boundary
>>>> records are stable. That means all we need to remember is only the
>>>> position of the latest corss-boundary record.
>>>
>>> I could not agree. In the following case, it may not work well.
>>> - record-A and record-B (record-B is a newer one) is copied, and
>>> - lastSegContRecStart/End points to record-B's, and
>>> - FlushPtr is proceeded to in the middle of record-A.
>>
>> IIUC, that means record-B is a cross segment-border record and we hav e
>> flushed beyond the recrod-B. In that case crash recovery afterwards
>> can read the complete record-B and will finish recovery *after* the
>> record-B. That's what we need here.
>
> I'm sorry I didn't explain enough.
>
> Record-A and Record-B are cross segment-border records.
> Record-A spans segment X and X+1
> Record-B spans segment X+2 and X+3.
> If both records have been inserted to WAL buffer, lastSegContRecStart/End points to Record-B.
> If a writer flushes upto the middle of segment-X+1, NotifyStableSegments() allows the writer to notify segment-X.
> Is my understanding correct?
I think this little ASCII drawing illustrates the above scenario:
AAAAA F BBBBB
|---------|---------|---------|
seg X seg X+1 seg X+2
AAAAA and BBBBB are Record-A and Record-B. F is the current flush pointer.
In this case, it would be OK to notify segment X, as long as F is
greater than the end of record A. And if I'm reading Kyotaro's patch
correctly, that's what would happen with the patch.
The patch seems correct to me. I'm a bit sad that we have to track yet
another WAL position (two, actually) to fix this, but I don't see a
better way.
I wonder if we should arrange things so that XLogwrtResult.Flush never
points in the middle of a record? I'm not totally convinced that all the
current callers of GetFlushRecPtr() are OK with a middle-of-WAL record
value. Could we get into similar trouble if a standby replicates half of
a cross-segment record to a cascaded standby, and the cascaded standby
has WAL archiving enabled?
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Yuki Seino | 2020-10-12 12:18:32 | Re: [PATCH] Add features to pg_stat_statements |
Previous Message | Dmitry Dolgov | 2020-10-12 11:45:30 | Re: Batching page logging during B-tree build |