Re: [RFC] Lock-free XLog Reservation from WAL

From: Юрий Соколов <y(dot)sokolov(at)postgrespro(dot)ru>
To: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Date: 2025-01-07 02:49:13
Message-ID: 80DD6982-98E1-45A5-A1D8-0C49965DD45D@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 6 Jan 2025, at 09:46, Zhou, Zhiguo <zhiguo(dot)zhou(at)intel(dot)com> wrote:
>
> Hi Yura and Wenhui,
>
> Thanks for kindly reviewing this work!
>
> On 1/3/2025 9:01 PM, wenhui qiu wrote:
>> Hi
>> Thank you for your path,NUM_XLOGINSERT_LOCKS increase to 128,I think it will be challenged,do we make it guc ?
>
> I noticed there have been some discussions (for example, [1] and its responses) about making NUM_XLOGINSERT_LOCKS a GUC, which seems to be a controversial proposal. Given that, we may first focus on the lock-free XLog reservation implementation, and leave the increase of NUM_XLOGINSERT_LOCKS for a future patch, where we would provide more quantitative evidence for the various implementations. WDYT?
>
>
>> On Fri, 3 Jan 2025 at 20:36, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru <mailto:y(dot)sokolov(at)postgrespro(dot)ru><mailto:y(dot)sokolov(at)postgrespro(dot)ru>> wrote:
>> Good day, Zhiguo.
>> Idea looks great.
>> Minor issue:
>> - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
>> I initially thought it became un-synchronized against
>> `ReserveXLogInsertLocation`, but looking closer I found it is
>> synchronized with `WALInsertLockAcquireExclusive`.
>> Since there are no other `insertpos_lck` usages after your patch, I
>> don't see why it should exists and be used in `ReserveXLogSwitch`.
>> Still I'd prefer to see CAS loop in this place to be consistent with
>> other non-locking access. And it will allow to get rid of
>> `WALInsertLockAcquireExclusive`, (though probably it is not a big
>> issue).
>
> Exactly, it should be safe to remove `insertpos_lck`. And I agree with you on getting rid of `WALInsertLockAcquireExclusive` with CAS loop which should significantly reduce the synchronization cost here especially when we intend to increase NUM_XLOGINSERT_LOCKS. I will try it in the next version of patch.
>
>
>> Major issue:
>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with on
>> platforms where MAXALIGN != 8 or without native 64 load/store. Branch
>> with 'memcpy` is rather obvious, but even pointer de-referencing on
>> "lucky case" is not safe either.
>> I have no idea how to fix it at the moment.
>
> Indeed, non-atomic write/read operations can lead to safety issues in some situations. My initial thought is to define a bit near the prev-link to flag the completion of the update. In this way, we could allow non-atomic or even discontinuous write/read operations on the prev-link, while simultaneously guaranteeing its atomicity through atomic operations (as well as memory barriers) on the flag bit. What do you think of this as a viable solution?
>
>
>> Readability issue:
>> - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
>> I had hard time to recognize `upto` is strictly not in the future.
>> - Certainly, final version have to have fixed and improved comments.
>> Many patch's ideas are strictly non-obvious. I had hard time to
>> recognize patch is not a piece of ... (excuse me for the swear
>> sentence).
>
> Thanks for the suggestion and patience. It's really more readable after inserting the assertion, I will fix it and improve other comments in the following patches.
>
>
>> Indeed, patch is much better than it looks on first sight.
>> I came with alternative idea yesterday, but looking closer to your
>> patch
>> today I see it is superior to mine (if atomic access will be fixed).
>
> [1] https://www.postgresql.org/message-id/2266698.1704854297%40sss.pgh.pa.us

Good day, Zhiguo.

Here’s my attempt to organise link to previous record without messing with xlog buffers:
- link is stored in lock-free hash table instead.

I don’t claim it is any better than using xlog buffers.
It is just alternative vision.

Some tricks in implementation:
- Relying on byte-position nature, it could be converted to 32 bit unique
value with `(uint32)(pos ^ (pos>>32))`. Certainly it is not totally unique,
but it is certainly unique among 32GB consecutive log.
- PrevBytePos could be calculated as a difference between positions, and
this difference is certainly less than 4GB, so it also could be stored as 32
bit value (PrevSize).
- Since xlog records are aligned we could use lowest bit of PrevSize as a lock.
- While Cuckoo Hashing could suffer from un-solvable cycle conflicts, this implementation relies on concurrent deleters which will eventually break such cycles if any.

I have a version without 32bit conversion trick, and it is a bit lighter on atomic instructions count, but it performs badly in absence of native 64bit atomics.

——
regards
Yura Sokolov aka funny-falcon

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-01-07 03:29:07 Re: allow changing autovacuum_max_workers without restarting
Previous Message Nathan Bossart 2025-01-07 02:44:37 Re: allow changing autovacuum_max_workers without restarting