Quick Links

Re: [RFC] Lock-free XLog Reservation from WAL

From:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To:	Japin Li <japinli(at)hotmail(dot)com>
Cc:	"Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [RFC] Lock-free XLog Reservation from WAL
Date:	2025-01-22 07:25:52
Message-ID:	aaa0d579-b990-4464-aa6b-eade52fe1cac@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

22.01.2025 09:09, Japin Li пишет:
> On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>> 17.01.2025 17:00, Zhou, Zhiguo пишет:
>>> On 1/16/2025 10:00 PM, Yura Sokolov wrote:
>>>>
>>>> Good day, Zhiguo.
>>>>
>>>> Excuse me, I feel sneaky a bit, but I've started another thread
>>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure
>>>> its effect even on my working notebook (it is another one: Ryzen
>>>> 5825U limited to @2GHz).
>>>>
>>>> http://postgr.es/m/flat/3b11fdc2-9793-403d-
>>>> b3d4-67ff9a00d447%40postgrespro.ru
>>>>
>>>> -----
>>>>
>>>> regards
>>>> Yura Sokolov aka funny-falcon
>>>>
>>>>
>>> Good day, Yura!
>>> Thank you for keeping me informed. I appreciate your proactive
>>> approach and understand the importance of exploring different angles
>>> for optimization. Your patch is indeed fundamental to our ongoing
>>> work on the lock-free xlog reservation, and I'm eager to see how it
>>> can further enhance our efforts.
>>> I will proceed to test the performance impact of your latest patch
>>> when combined with the lock-free xlog reservation patch. This will
>>> help us determine if there's potential for additional
>>> optimization. Concurrently, with your permission, I'll try to refine
>>> the hash-table- based implementation for your further review. WDYT?
>>>
>>
>> Good day, Zhiguo
>>
>> Here's version of "hash-table reservation" with both 32bit and 64bit
>> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be
>> switched by hand).
>>
>> 64bit version uses other protocol with a bit lesser atomic
>> operations. I suppose it could be a bit faster. But I can't prove it
>> now.
>>
>> btw, you wrote:
>>
>>>> Major issue:
>>>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read
>> with on
>>>> platforms where MAXALIGN != 8 or without native 64
>> load/store. Branch
>>>> with 'memcpy` is rather obvious, but even pointer de-referencing on
>>>> "lucky case" is not safe either.
>>>>
>>>> I have no idea how to fix it at the moment.
>>>>
>>>
>>> Indeed, non-atomic write/read operations can lead to safety issues in
>>> some situations. My initial thought is to define a bit near the
>>> prev-link to flag the completion of the update. In this way, we could
>>> allow non-atomic or even discontinuous write/read operations on the
>>> prev-link, while simultaneously guaranteeing its atomicity through
>>> atomic operations (as well as memory barriers) on the flag bit. What
>>> do you think of this as a viable solution?
>>
>> There is a way to order operations:
>> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits
>> are certainly non-zero (record could not start at the beginning of a
>> page).
>> - so SetPrevRecPtr should write high 32bits, issue write barrier, and
>> then write lower 32bits,
>> - and then GetPrevRecPtr should first read lower 32bits, and if it is
>> not zero, then issue read barrier and read upper 32bits.
>>
>> This way you will always read correct prev-rec-ptr on platform without
>> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte
>> atomicity for several years).
>>
>
> Hi, Yura Sokolov
>
> Thanks for updating the patch.
> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
> result:
>
> case | min | avg | max
> --------------------+------------+------------+--------------
> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>
> The patch provides a significant improvement.
>
> I just looked through the patch, here are some comments.
>
> 1.
> The v2 patch can't be applied cleanly.
>
> Applying: Lock-free XLog Reservation using lock-free hash-table
> .git/rebase-apply/patch:33: trailing whitespace.
>
> .git/rebase-apply/patch:37: space before tab in indent.
> {
> .git/rebase-apply/patch:38: space before tab in indent.
> int i;
> .git/rebase-apply/patch:39: trailing whitespace.
>
> .git/rebase-apply/patch:46: space before tab in indent.
> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
> warning: squelched 4 whitespace errors
> warning: 9 lines add whitespace errors.
>
> 2.
> And there is a typo:
>
> + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is
> + * mostly 4 way: for every element computed two positions h1, h2, and
>
> s/algorith/algorithm/g

Hi, Japin

Thank you a lot for measuring and comments.

May I ask you to compare not only against master, but against straight
increase of NUM_XLOGINSERT_LOCKS to 128 as well?
This way the profit from added complexity will be more obvious: does it
pay for self or not.

-------

regards
Yura

In response to

Re: [RFC] Lock-free XLog Reservation from WAL at 2025-01-22 06:09:48 from Japin Li

Responses

Re: [RFC] Lock-free XLog Reservation from WAL at 2025-01-22 07:54:32 from Japin Li

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Japin Li	2025-01-22 07:54:32	Re: [RFC] Lock-free XLog Reservation from WAL
Previous Message	Richard Guo	2025-01-22 06:48:43	Re: Eager aggregation, take 3