Re: [RFC] Lock-free XLog Reservation from WAL

From: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Date: 2025-01-06 06:46:03
Message-ID: b33ffba7-0253-4f11-9fd5-cfff7832c3ac@intel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Yura and Wenhui,

Thanks for kindly reviewing this work!

On 1/3/2025 9:01 PM, wenhui qiu wrote:
> Hi
>     Thank you for your path,NUM_XLOGINSERT_LOCKS increase to 128,I
> think it will be challenged,do we make it guc ?
>

I noticed there have been some discussions (for example, [1] and its
responses) about making NUM_XLOGINSERT_LOCKS a GUC, which seems to be a
controversial proposal. Given that, we may first focus on the lock-free
XLog reservation implementation, and leave the increase of
NUM_XLOGINSERT_LOCKS for a future patch, where we would provide more
quantitative evidence for the various implementations. WDYT?

> On Fri, 3 Jan 2025 at 20:36, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru
> <mailto:y(dot)sokolov(at)postgrespro(dot)ru>> wrote:
>
> Good day, Zhiguo.
>
> Idea looks great.
>
> Minor issue:
> - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
>
> I initially thought it became un-synchronized against
> `ReserveXLogInsertLocation`, but looking closer I found it is
> synchronized with `WALInsertLockAcquireExclusive`.
> Since there are no other `insertpos_lck` usages after your patch, I
> don't see why it should exists and be used in `ReserveXLogSwitch`.
>
> Still I'd prefer to see CAS loop in this place to be consistent with
> other non-locking access. And it will allow to get rid of
> `WALInsertLockAcquireExclusive`, (though probably it is not a big
> issue).
>

Exactly, it should be safe to remove `insertpos_lck`. And I agree with
you on getting rid of `WALInsertLockAcquireExclusive` with CAS loop
which should significantly reduce the synchronization cost here
especially when we intend to increase NUM_XLOGINSERT_LOCKS. I will try
it in the next version of patch.

> Major issue:
> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with on
> platforms where MAXALIGN != 8 or without native 64 load/store. Branch
> with 'memcpy` is rather obvious, but even pointer de-referencing on
> "lucky case" is not safe either.
>
> I have no idea how to fix it at the moment.
>

Indeed, non-atomic write/read operations can lead to safety issues in
some situations. My initial thought is to define a bit near the
prev-link to flag the completion of the update. In this way, we could
allow non-atomic or even discontinuous write/read operations on the
prev-link, while simultaneously guaranteeing its atomicity through
atomic operations (as well as memory barriers) on the flag bit. What do
you think of this as a viable solution?

> Readability issue:
> - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
> I had hard time to recognize `upto` is strictly not in the future.
> - Certainly, final version have to have fixed and improved comments.
> Many patch's ideas are strictly non-obvious. I had hard time to
> recognize patch is not a piece of ... (excuse me for the swear
> sentence).

Thanks for the suggestion and patience. It's really more readable after
inserting the assertion, I will fix it and improve other comments in the
following patches.

> Indeed, patch is much better than it looks on first sight.
> I came with alternative idea yesterday, but looking closer to your
> patch
> today I see it is superior to mine (if atomic access will be fixed).
>
> ----
>
> regards,
> Yura Sokolov aka funny-falcon
>
>

Regards,
Zhiguo

[1] https://www.postgresql.org/message-id/2266698.1704854297%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2025-01-06 07:00:00 stats.sql might fail due to shared buffers also used by parallel tests
Previous Message John Naylor 2025-01-06 06:40:39 Re: Fix crash when non-creator being an iteration on shared radix tree