Re: [RFC] Lock-free XLog Reservation from WAL

From: Japin Li <japinli(at)hotmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Lock-free XLog Reservation from WAL
Date: 2025-01-22 06:09:48
Message-ID: ME0P300MB04457D905300032F52C291B2B6E12@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> 17.01.2025 17:00, Zhou, Zhiguo пишет:
>> On 1/16/2025 10:00 PM, Yura Sokolov wrote:
>>>
>>> Good day, Zhiguo.
>>>
>>> Excuse me, I feel sneaky a bit, but I've started another thread
>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure
>>> its effect even on my working notebook (it is another one: Ryzen
>>> 5825U limited to @2GHz).
>>>
>>> http://postgr.es/m/flat/3b11fdc2-9793-403d-
>>> b3d4-67ff9a00d447%40postgrespro.ru
>>>
>>> -----
>>>
>>> regards
>>> Yura Sokolov aka funny-falcon
>>>
>>>
>> Good day, Yura!
>> Thank you for keeping me informed. I appreciate your proactive
>> approach and understand the importance of exploring different angles
>> for optimization. Your patch is indeed fundamental to our ongoing
>> work on the lock-free xlog reservation, and I'm eager to see how it
>> can further enhance our efforts.
>> I will proceed to test the performance impact of your latest patch
>> when combined with the lock-free xlog reservation patch. This will
>> help us determine if there's potential for additional
>> optimization. Concurrently, with your permission, I'll try to refine
>> the hash-table- based implementation for your further review. WDYT?
>>
>
> Good day, Zhiguo
>
> Here's version of "hash-table reservation" with both 32bit and 64bit
> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be
> switched by hand).
>
> 64bit version uses other protocol with a bit lesser atomic
> operations. I suppose it could be a bit faster. But I can't prove it
> now.
>
> btw, you wrote:
>
>>> Major issue:
>>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read
> with on
>>> platforms where MAXALIGN != 8 or without native 64
> load/store. Branch
>>> with 'memcpy` is rather obvious, but even pointer de-referencing on
>>> "lucky case" is not safe either.
>>>
>>> I have no idea how to fix it at the moment.
>>>
>>
>> Indeed, non-atomic write/read operations can lead to safety issues in
>> some situations. My initial thought is to define a bit near the
>> prev-link to flag the completion of the update. In this way, we could
>> allow non-atomic or even discontinuous write/read operations on the
>> prev-link, while simultaneously guaranteeing its atomicity through
>> atomic operations (as well as memory barriers) on the flag bit. What
>> do you think of this as a viable solution?
>
> There is a way to order operations:
> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits
> are certainly non-zero (record could not start at the beginning of a
> page).
> - so SetPrevRecPtr should write high 32bits, issue write barrier, and
> then write lower 32bits,
> - and then GetPrevRecPtr should first read lower 32bits, and if it is
> not zero, then issue read barrier and read upper 32bits.
>
> This way you will always read correct prev-rec-ptr on platform without
> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte
> atomicity for several years).
>

Hi, Yura Sokolov

Thanks for updating the patch.
I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
result:

case | min | avg | max
--------------------+------------+------------+--------------
master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92

The patch provides a significant improvement.

I just looked through the patch, here are some comments.

1.
The v2 patch can't be applied cleanly.

Applying: Lock-free XLog Reservation using lock-free hash-table
.git/rebase-apply/patch:33: trailing whitespace.

.git/rebase-apply/patch:37: space before tab in indent.
{
.git/rebase-apply/patch:38: space before tab in indent.
int i;
.git/rebase-apply/patch:39: trailing whitespace.

.git/rebase-apply/patch:46: space before tab in indent.
LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
warning: squelched 4 whitespace errors
warning: 9 lines add whitespace errors.

2.
And there is a typo:

+ * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is
+ * mostly 4 way: for every element computed two positions h1, h2, and

s/algorith/algorithm/g

--
Regrads,
Japin Li

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2025-01-22 06:12:15 Re: Allow NOT VALID foreign key constraints on partitioned tables.
Previous Message Michael Paquier 2025-01-22 05:50:47 Re: pg_createsubscriber TAP test wrapping makes command options hard to read.