From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: [RFC] Lock-free XLog Reservation from WAL |
Date: | 2025-01-10 18:33:57 |
Message-ID: | 7b31f916-2b7d-49c7-b70a-b0342ba6b423@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
10.01.2025 19:53, Matthias van de Meent пишет:
> On Fri, 10 Jan 2025 at 13:42, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>>
>> BTW, your version could make alike trick for guaranteed atomicity:
>> - change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset`
>> and store offset to prev record's start.
>
> -1, I don't think that is possible without degrading what our current
> WAL system protects against.
>
> For intra-record torn write protection we have the checksum, but that
> same protection doesn't cover the multiple WAL records on each page.
> That is what the xl_prev pointer is used for - detecting that this
> part of the page doesn't contain the correct data (e.g. the data of a
> previous version of this recycled segment).
> If we replaced xl_prev with just an offset into the segment, then this
> protection would be much less effective, as the previous version of
> the segment realistically used the same segment offsets at the same
> offsets into the file.
Well, to protect against "torn write" it is enough to have "self-lsn"
field, not "prev-lsn". So 8 byte "self-lsn" + "offset-to-prev" would work.
But this way header will be increased by 4 bytes compared to current
one, not decreased.
Just thought:
If XLogRecord alignment were stricter (for example, 32 bytes), then LSN
could mean not byte-offset, but 32byte-offset. Then low 32bits of LSN
will cover 128GB of WAL logs. For most installations re-use distance for
WAL segments doubdfully longer than 128GB. But I believe, there are some
with larger one. So it is not reliable.
> To protect against torn writes while still only using record segment
> offsets, you'd have zero and then fsync any segment before reusing it,
> which would severely reduce the benefits we get from recycling
> segments.
> Note that we can't expect the page header to help here, as write tears
> can happen at nearly any offset into the page - not just 8k intervals
> - and so the page header is not always representative of the origins
> of all bytes on the page - only the first 24 (if even that).
-----
regards,
Yura
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-01-10 18:42:11 | Re: Reorder shutdown sequence, to flush pgstats later |
Previous Message | James Hunter | 2025-01-10 18:00:15 | Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators |