Re: [RFC] Lock-free XLog Reservation from WAL

From: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
To: Юрий Соколов <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Date: 2025-01-09 16:03:46
Message-ID: ebeb877e-603a-418a-b82b-b2863471e44a@intel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/7/2025 10:49 AM, Юрий Соколов wrote:
>
>> On 6 Jan 2025, at 09:46, Zhou, Zhiguo <zhiguo(dot)zhou(at)intel(dot)com> wrote:
>>
>> Hi Yura and Wenhui,
>>
>> Thanks for kindly reviewing this work!
>>
>> On 1/3/2025 9:01 PM, wenhui qiu wrote:
>>> Hi
>>> Thank you for your path,NUM_XLOGINSERT_LOCKS increase to 128,I
>>> think it will be challenged,do we make it guc ?
>>
>> I noticed there have been some discussions (for example, [1] and its
>> responses) about making NUM_XLOGINSERT_LOCKS a GUC, which seems to be
>> a controversial proposal. Given that, we may first focus on the lock-
>> free XLog reservation implementation, and leave the increase of
>> NUM_XLOGINSERT_LOCKS for a future patch, where we would provide more
>> quantitative evidence for the various implementations. WDYT?
>>
>>
>>> On Fri, 3 Jan 2025 at 20:36, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru
>>> <mailto:y(dot)sokolov(at)postgrespro(dot)ru><mailto:y(dot)sokolov(at)postgrespro(dot)ru
>>> <mailto:y(dot)sokolov(at)postgrespro(dot)ru>>> wrote:
>>>    Good day, Zhiguo.
>>>    Idea looks great.
>>>    Minor issue:
>>>    - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
>>>    I initially thought it became un-synchronized against
>>>    `ReserveXLogInsertLocation`, but looking closer I found it is
>>>    synchronized with `WALInsertLockAcquireExclusive`.
>>>    Since there are no other `insertpos_lck` usages after your patch, I
>>>    don't see why it should exists and be used in `ReserveXLogSwitch`.
>>>    Still I'd prefer to see CAS loop in this place to be consistent with
>>>    other non-locking access. And it will allow to get rid of
>>>    `WALInsertLockAcquireExclusive`, (though probably it is not a big
>>>    issue).
>>
>> Exactly, it should be safe to remove `insertpos_lck`. And I agree with
>> you on getting rid of `WALInsertLockAcquireExclusive` with CAS loop
>> which should significantly reduce the synchronization cost here
>> especially when we intend to increase NUM_XLOGINSERT_LOCKS. I will try
>> it in the next version of patch.
>>
>>
>>>    Major issue:
>>>    - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with on
>>>    platforms where MAXALIGN != 8 or without native 64 load/store. Branch
>>>    with 'memcpy` is rather obvious, but even pointer de-referencing on
>>>    "lucky case" is not safe either.
>>>    I have no idea how to fix it at the moment.
>>
>> Indeed, non-atomic write/read operations can lead to safety issues in
>> some situations. My initial thought is to define a bit near the prev-
>> link to flag the completion of the update. In this way, we could allow
>> non-atomic or even discontinuous write/read operations on the prev-
>> link, while simultaneously guaranteeing its atomicity through atomic
>> operations (as well as memory barriers) on the flag bit. What do you
>> think of this as a viable solution?
>>
>>
>>>    Readability issue:
>>>    - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
>>>    I had hard time to recognize `upto` is strictly not in the future.
>>>    - Certainly, final version have to have fixed and improved comments.
>>>    Many patch's ideas are strictly non-obvious. I had hard time to
>>>    recognize patch is not a piece of ... (excuse me for the swear
>>>    sentence).
>>
>> Thanks for the suggestion and patience. It's really more readable
>> after inserting the assertion, I will fix it and improve other
>> comments in the following patches.
>>
>>
>>>    Indeed, patch is much better than it looks on first sight.
>>>    I came with alternative idea yesterday, but looking closer to your
>>>    patch
>>>    today I see it is superior to mine (if atomic access will be fixed).
>>
>> [1]https://www.postgresql.org/message-
>> id/2266698.1704854297%40sss.pgh.pa.us <https://www.postgresql.org/
>> message-id/2266698.1704854297%40sss.pgh.pa.us>
>
> Good day, Zhiguo.
>
> Here’s my attempt to organise link to previous record without messing
> with xlog buffers:
> - link is stored in lock-free hash table instead.
>
> I don’t claim it is any better than using xlog buffers.
> It is just alternative vision.
>
> Some tricks in implementation:
> - Relying on byte-position nature, it could be converted to 32 bit unique
>   value with `(uint32)(pos ^ (pos>>32))`. Certainly it is not totally
> unique,
>   but it is certainly unique among 32GB consecutive log.
> - PrevBytePos could be calculated as a difference between positions, and
>   this difference is certainly less than 4GB, so it also could be
> stored as 32
>   bit value (PrevSize).
> - Since xlog records are aligned we could use lowest bit of PrevSize as
> a lock.
> - While Cuckoo Hashing could suffer from un-solvable cycle conflicts,
> this implementation relies on concurrent deleters which will eventually
> break such cycles if any.
>
> I have a version without 32bit conversion trick, and it is a bit lighter
> on atomic instructions count, but it performs badly in absence of native
> 64bit atomics.
>
> ——
> regards
> Yura Sokolov aka funny-falcon

Good day, Yura!

Your implementation based on the lock-free hash table is truly
impressive! One of the aspects I particularly admire is how your
solution doesn't require breaking the current convention of XLog
insertion, whose revision is quite error-prone and ungraceful. My minor
concern is that the limited number of entries (256) in the hash table
would be a bottleneck for parallel memory reservation, but I believe
this is not a critical issue.

I will soon try to evaluate the performance impact of your patch on my
device with the TPCC benchmark and also profile it to see if there are
any changes that could be made to further improve it.

BTW, do you have a plan to merge this patch to the master branch? Thanks!

Regards,
Zhiguo

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2025-01-09 16:17:31 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Bertrand Drouvot 2025-01-09 15:30:37 Re: Make pg_stat_io view count IOs as bytes instead of blocks