From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com> |
Subject: | Re: Get rid of WALBufMappingLock |
Date: | 2025-02-07 11:39:09 |
Message-ID: | CAPpHfduc9YMmGcBocD=03F+rbMV0XpA2KoHQC9bQN1QEpPjvPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 7, 2025 at 1:24 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> 07.02.2025 14:02, Yura Sokolov пишет:
> > 07.02.2025 01:26, Alexander Korotkov пишет:
> >> Hi!
> >>
> >> On Sun, Jan 19, 2025 at 2:11 AM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> >>>
> >>> During discussion of Increasing NUM_XLOGINSERT_LOCKS [1], Andres Freund
> >>> used benchmark which creates WAL records very intensively. While I this
> >>> it is not completely fair (1MB log records are really rare), it pushed
> >>> me to analyze write-side waiting of XLog machinery.
> >>>
> >>> First I tried to optimize WaitXLogInsertionsToFinish, but without great
> >>> success (yet).
> >>>
> >>> While profiling, I found a lot of time is spend in the memory clearing
> >>> under global WALBufMappingLock:
> >>>
> >>> MemSet((char *) NewPage, 0, XLOG_BLCKSZ);
> >>>
> >>> It is obvious scalability bottleneck.
> >>>
> >>> So "challenge was accepted".
> >>>
> >>> Certainly, backend should initialize pages without exclusive lock. But
> >>> which way to ensure pages were initialized? In other words, how to
> >>> ensure XLogCtl->InitializedUpTo is correct.
> >>>
> >>> I've tried to play around WALBufMappingLock with holding it for a short
> >>> time and spinning on XLogCtl->xlblocks[nextidx]. But in the end I found
> >>> WALBufMappingLock is useless at all.
> >>>
> >>> Instead of holding lock, it is better to allow backends to cooperate:
> >>> - I bound ConditionVariable to each xlblocks entry,
> >>> - every backend now checks every required block pointed by
> >>> InitializedUpto was successfully initialized or sleeps on its condvar,
> >>> - when backend sure block is initialized, it tries to update
> >>> InitializedUpTo with conditional variable.
> >>
> >> Looks reasonable for me, but having ConditionVariable per xlog buffer
> >> seems overkill for me. Find an attached revision, where I've
> >> implemented advancing InitializedUpTo without ConditionVariable.
> >> After initialization of each buffer there is attempt to do CAS for
> >> InitializedUpTo in a loop. So, multiple processes will try to advance
> >> InitializedUpTo, they could hijack initiative from each other, but
> >> there is always a leader which will finish the work.
> >>
> >> There is only one ConditionVariable to wait for InitializedUpTo being advanced.
> >>
> >> I didn't benchmark my version, just checked that tests passed.
> >
> > Good day, Alexander.
>
> Seems I was mistaken twice.
>
> > I've got mixed but quite close result for both approaches (single or many
> > ConditionVariable) on the notebook. Since I have no access to larger
> > machine, I can't prove "many" is way better (or discover it worse).
>
> 1. "many condvars" (my variant) is strictly worse with num locks = 64 and
> when pg_logical_emit_message emits just 1kB instead of 1MB.
>
> Therefore, "single condvar" is strictly better.
>
> > Given patch after cleanup looks a bit smaller and clearer, I agree to keep
> > just single condition variable.
> >
> > Cleaned version is attached.
> >
> > I've changed condition for broadcast a bit ("less" instead "not equal"):
> > - buffer's border may already go into future,
> > - and then other backend will reach not yet initialized buffer and will
> > broadcast.
>
> 2. I've inserted abort if "buffer's border went into future", and wasn't
> able to trigger it.
>
> So I returned update-loop's body to your variant.
Good, thank you. I think 0001 patch is generally good, but needs some
further polishing, e.g. more comments explaining how does it work.
Regarding 0002 patch, it looks generally reasonable. But are 2
attempts always optimal? Are there cases of regression, or cases when
more attempts are even better? Could we have there some
self-adjusting mechanism like what we have for spinlocks?
------
Regards,
Alexander Korotkov
Supabase
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2025-02-07 12:34:47 | Re: Virtual generated columns |
Previous Message | Yura Sokolov | 2025-02-07 11:24:49 | Re: Get rid of WALBufMappingLock |