Re: [RFC] Lock-free XLog Reservation from WAL

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Date: 2025-01-03 13:01:30
Message-ID: CAGjGUAJbWgv5BB8-zm+6McUw7OqJ3paDv5CqeppTgXZk6OHF-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi
Thank you for your path,NUM_XLOGINSERT_LOCKS increase to 128,I think it
will be challenged,do we make it guc ?

On Fri, 3 Jan 2025 at 20:36, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:

> 02.01.2025 10:05, Zhou, Zhiguo wrote:
> > Hi all,
> >
> > I am reaching out to solicit your insights and comments on a recent
> proposal regarding the "Lock-free XLog Reservation from WAL." We have
> identified some challenges with the current WAL insertions, which require
> space reservations in the WAL buffer which involve updating two
> shared-memory statuses in XLogCtlInsert: CurrBytePos (the start position of
> the current XLog) and PrevBytePos (the prev-link to the previous XLog).
> Currently, the use of XLogCtlInsert.insertpos_lck ensures consistency but
> introduces lock contention, hindering the parallelism of XLog insertions.
> >
> > To address this issue, we propose the following changes:
> >
> > 1. Removal of PrevBytePos: This will allow increments of the CurrBytePos
> (a single uint64 field) to be implemented with an atomic operation
> (fetch_add).
> > 2. Updating Prev-Link of next XLog: Based on the fact that the prev-link
> of the next XLog always points to the head of the current Xlog,we will
> slightly exceed the reserved memory range of the current XLog to update the
> prev-link of the next XLog, regardless of which backend acquires the next
> memory space. The next XLog inserter will wait until its prev-link is
> updated for CRC calculation before starting its own XLog copy into the WAL.
> > 3. Breaking Sequential Write Convention: Each backend will update the
> prev-link of its next XLog first, then return to the header position for
> the current log insertion. This change will reduce the dependency of XLog
> writes on previous ones (compared with the sequential writes).
> > 4. Revised GetXLogBuffer: To support #3, we need update this function to
> separate the LSN it intends to access from the LSN it expects to update in
> the insertingAt field.
> > 5. Increase NUM_XLOGINSERT_LOCKS: With the above changes, increasing
> NUM_XLOGINSERT_LOCKS, for example to 128, could effectively enhance the
> parallelism.
> >
> > The attached patch could pass the regression tests (make check, make
> check-world), and in the performance test of this POC on SPR (480 vCPU)
> shows that this optimization could help the TPCC benchmark better scale
> with the core count and as a result the performance with full cores enabled
> could be improved by 2.04x.
> >
> > Before we proceed with further patch validation and refinement work, we
> are eager to hear the community's thoughts and comments on this
> optimization so that we can confirm our current work aligns with
> expectations.
>
> Good day, Zhiguo.
>
> Idea looks great.
>
> Minor issue:
> - you didn't remove use of `insertpos_lck` from `ReserveXLogSwitch`.
>
> I initially thought it became un-synchronized against
> `ReserveXLogInsertLocation`, but looking closer I found it is
> synchronized with `WALInsertLockAcquireExclusive`.
> Since there are no other `insertpos_lck` usages after your patch, I
> don't see why it should exists and be used in `ReserveXLogSwitch`.
>
> Still I'd prefer to see CAS loop in this place to be consistent with
> other non-locking access. And it will allow to get rid of
> `WALInsertLockAcquireExclusive`, (though probably it is not a big issue).
>
> Major issue:
> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read with on
> platforms where MAXALIGN != 8 or without native 64 load/store. Branch
> with 'memcpy` is rather obvious, but even pointer de-referencing on
> "lucky case" is not safe either.
>
> I have no idea how to fix it at the moment.
>
> Readability issue:
> - It would be good to add `Assert(ptr >= upto)` into `GetXLogBuffer`.
> I had hard time to recognize `upto` is strictly not in the future.
> - Certainly, final version have to have fixed and improved comments.
> Many patch's ideas are strictly non-obvious. I had hard time to
> recognize patch is not a piece of ... (excuse me for the swear sentence).
>
> Indeed, patch is much better than it looks on first sight.
> I came with alternative idea yesterday, but looking closer to your patch
> today I see it is superior to mine (if atomic access will be fixed).
>
> ----
>
> regards,
> Yura Sokolov aka funny-falcon
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-01-03 13:14:01 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Yura Sokolov 2025-01-03 12:36:45 Re: [RFC] Lock-free XLog Reservation from WAL