Quick Links

Re: Get rid of WALBufMappingLock

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc:	"pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Subject:	Re: Get rid of WALBufMappingLock
Date:	2025-02-07 11:39:09
Message-ID:	CAPpHfduc9YMmGcBocD=03F+rbMV0XpA2KoHQC9bQN1QEpPjvPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Feb 7, 2025 at 1:24 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> 07.02.2025 14:02, Yura Sokolov пишет:
> > 07.02.2025 01:26, Alexander Korotkov пишет:
> >> Hi!
> >>
> >> On Sun, Jan 19, 2025 at 2:11 AM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> >>>
> >>> During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund
> >>> used benchmark which creates WAL records very intensively. While I this
> >>> it is not completely fair (1MB log records are really rare), it pushed
> >>> me to analyze write-side waiting of XLog machinery.
> >>>
> >>> First I tried to optimize WaitXLogInsertionsToFinish, but without great
> >>> success (yet).
> >>>
> >>> While profiling, I found a lot of time is spend in the memory clearing
> >>> under global WALBufMappingLock:
> >>>
> >>> MemSet((char *) NewPage, 0, XLOG_BLCKSZ);
> >>>
> >>> It is obvious scalability bottleneck.
> >>>
> >>> So "challenge was accepted".
> >>>
> >>> Certainly, backend should initialize pages without exclusive lock. But
> >>> which way to ensure pages were initialized? In other words, how to
> >>> ensure XLogCtl->InitializedUpTo is correct.
> >>>
> >>> I've tried to play around WALBufMappingLock with holding it for a short
> >>> time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found
> >>> WALBufMappingLock is useless at all.
> >>>
> >>> Instead of holding lock, it is better to allow backends to cooperate:
> >>> - I bound ConditionVariable to each xlblocks entry,
> >>> - every backend now checks every required block pointed by
> >>> InitializedUpto was successfully initialized or sleeps on its condvar,
> >>> - when backend sure block is initialized, it tries to update
> >>> InitializedUpTo with conditional variable.
> >>
> >> Looks reasonable for me, but having ConditionVariable per xlog buffer
> >> seems overkill for me. Find an attached revision, where I've
> >> implemented advancing InitializedUpTo without ConditionVariable.
> >> After initialization of each buffer there is attempt to do CAS for
> >> InitializedUpTo in a loop. So, multiple processes will try to advance
> >> InitializedUpTo, they could hijack initiative from each other, but
> >> there is always a leader which will finish the work.
> >>
> >> There is only one ConditionVariable to wait for InitializedUpTo being advanced.
> >>
> >> I didn't benchmark my version, just checked that tests passed.
> >
> > Good day, Alexander.
>
> Seems I was mistaken twice.
>
> > I've got mixed but quite close result for both approaches (single or many
> > ConditionVariable) on the notebook. Since I have no access to larger
> > machine, I can't prove "many" is way better (or discover it worse).
>
> 1. "many condvars" (my variant) is strictly worse with num locks = 64 and
> when pg_logical_emit_message emits just 1kB instead of 1MB.
>
> Therefore, "single condvar" is strictly better.
>
> > Given patch after cleanup looks a bit smaller and clearer, I agree to keep
> > just single condition variable.
> >
> > Cleaned version is attached.
> >
> > I've changed condition for broadcast a bit ("less" instead "not equal"):
> > - buffer's border may already go into future,
> > - and then other backend will reach not yet initialized buffer and will
> > broadcast.
>
> 2. I've inserted abort if "buffer's border went into future", and wasn't
> able to trigger it.
>
> So I returned update-loop's body to your variant.

Good, thank you. I think 0001 patch is generally good, but needs some
further polishing, e.g. more comments explaining how does it work.

Regarding 0002 patch, it looks generally reasonable. But are 2
attempts always optimal? Are there cases of regression, or cases when
more attempts are even better? Could we have there some
self-adjusting mechanism like what we have for spinlocks?

------
Regards,
Alexander Korotkov
Supabase

In response to

Re: Get rid of WALBufMappingLock at 2025-02-07 11:24:49 from Yura Sokolov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-02-07 12:34:47	Re: Virtual generated columns
Previous Message	Yura Sokolov	2025-02-07 11:24:49	Re: Get rid of WALBufMappingLock