From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [RFC] Lock-free XLog Reservation from WAL |
Date: | 2025-01-10 16:53:14 |
Message-ID: | CAEze2Wj+xro-X1PBAPFQR4eHuyqeWN=A7awuOsdLBzGTbjwW4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 10 Jan 2025 at 13:42, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>
> BTW, your version could make alike trick for guaranteed atomicity:
> - change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset`
> and store offset to prev record's start.
-1, I don't think that is possible without degrading what our current
WAL system protects against.
For intra-record torn write protection we have the checksum, but that
same protection doesn't cover the multiple WAL records on each page.
That is what the xl_prev pointer is used for - detecting that this
part of the page doesn't contain the correct data (e.g. the data of a
previous version of this recycled segment).
If we replaced xl_prev with just an offset into the segment, then this
protection would be much less effective, as the previous version of
the segment realistically used the same segment offsets at the same
offsets into the file.
To protect against torn writes while still only using record segment
offsets, you'd have zero and then fsync any segment before reusing it,
which would severely reduce the benefits we get from recycling
segments.
Note that we can't expect the page header to help here, as write tears
can happen at nearly any offset into the page - not just 8k intervals
- and so the page header is not always representative of the origins
of all bytes on the page - only the first 24 (if even that).
Kind regards,
Matthias van de Meent
From | Date | Subject | |
---|---|---|---|
Next Message | Michail Nikolaev | 2025-01-10 17:06:10 | Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM? |
Previous Message | Nathan Bossart | 2025-01-10 16:51:40 | Re: use a non-locking initial test in TAS_SPIN on AArch64 |