Re: Get rid of WALBufMappingLock

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Subject: Re: Get rid of WALBufMappingLock
Date: 2025-02-13 16:39:32
Message-ID: CALT9ZEELhvTOD=Cs=QOG=T42yWAdmD53M5ELqJF53tefffuHrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Yura and Alexander!

On Thu, 13 Feb 2025 at 14:08, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
>
> On Thu, Feb 13, 2025 at 11:45 AM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> > 13.02.2025 12:34, Alexander Korotkov пишет:
> > > On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> > >> 08.02.2025 13:07, Alexander Korotkov пишет:
> > >>> On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > >>>> Good, thank you. I think 0001 patch is generally good, but needs some
> > >>>> further polishing, e.g. more comments explaining how does it work.
> > >>
> > >> I tried to add more comments. I'm not good at, so recommendations are welcome.
> > >>
> > >>> Two things comes to my mind worth rechecking about 0001.
> > >>> 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> > >>> XLogCtl->xlblocks always page-aligned? Because algorithm seems to be
> > >>> sensitive to that. If so, I would propose to explicitly comment that
> > >>> and add corresponding asserts.
> > >>
> > >> They're certainly page aligned, since they are page borders.
> > >> I added assert on alignment of InitializeReserved for the sanity.
> > >>
> > >>> 2) Check if there are concurrency issues between
> > >>> AdvanceXLInsertBuffer() and switching to the new WAL file.
> > >>
> > >> There are no issues:
> > >> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> > >> GetXLogBuffer to zero-out WAL page.
> > >> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion locks,
> > >> so switching wal is not concurrent. (Although, there is no need in this
> > >> exclusiveness, imho.)
> > >
> > > Good, thank you. I've also revised commit message and comments.
> > >
> > > But I see another issue with this patch. In the worst case, we do
> > > XLogWrite() by ourselves, and it could potentially could error out.
> > > Without patch, that would cause WALBufMappingLock be released and
> > > XLogCtl->InitializedUpTo not advanced. With the patch, that would
> > > cause other processes infinitely waiting till we finish the
> > > initialization.
> > >
> > > Possible solution would be to save position of the page to be
> > > initialized, and set it back to XLogCtl->InitializeReserved on error
> > > (everywhere we do LWLockReleaseAll()). We also must check that on
> > > error we only set XLogCtl->InitializeReserved to the past, because
> > > there could be multiple concurrent failures. Also we need to
> > > broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
> >
> > The single place where AdvanceXLInsertBuffer is called outside of critical
> > section is in XLogBackgroundFlush. All other call stacks will issue server
> > restart if XLogWrite will raise error inside of AdvanceXLInsertBuffer.
> >
> > XLogBackgroundFlush explicitely avoids writing buffers by passing
> > opportunistic=true parameter.
> >
> > Therefore, error in XLogWrite will not cause hang in AdvanceXLInsertBuffer
> > since server will shutdown/restart.
> >
> > Perhaps, we just need to insert `Assert(CritSectionCount > 0);` before call
> > to XLogWrite here?
>
> You're correct. I just reflected this in the next revision of the patch.

I've looked at the patchset v6.

The overall goal to avoid locking looks good. Both patches look right
to me. The only thing I'm slightly concerned about if patchset could
demonstrate performance differences in any real workload. I appreciate
it as a beautiful optimization anyway.

I have the following proposals to change the code and comments:

For patch 0001:

- Struct XLBlocks contains just one pg_atomic_uint64 member. Is it
still needed as a struct? These changes make a significant volume of
changes to the patch, being noop. Maybe it was inherited from v1 and
not needed anymore.

- Furthermore when xlblocks became a struct comments like:
> and xlblocks values certainly do. xlblocks values are changed
need to be changed to xlblocks.bound. This could be avoided by
changing back xlblocks from type XLBlocks * to pg_atomic_uint64 *

- It's worth more detailed commenting
InitializedUpTo/InitializedUpToCondVar than:
+ * It is updated to successfully initialized buffer's
identities, perhaps
+ * waiting on conditional variable bound to buffer.

"perhaps waiting" could also be in style "maybe/even while AAA waits BBB"

"lock-free with cooperation with" -> "lock-free accompanied by changes to..."

- Comment inside typedef struct XLogCtlData:
/* 1st byte ptr-s + XLOG_BLCKSZ (and condvar * for) */
need to be returned back
/* 1st byte ptr-s + XLOG_BLCKSZ */

- Commented out code for cleanup in the final patch:
//ConditionVariableBroadcast(&XLogCtl->xlblocks[nextidx].condvar);

- in AdvanceXLInsertBuffer()
npages initialised to 0 but it is not increased anymore
Block under
> if (XLOG_DEBUG && npages > 0)
became unreachable

(InvalidXLogRecPtr + 1) is essentially 0+1 and IMO this semantically
calls for adding #define FirstValidXLogRecPtr 1

Typo in a commit message: %s/usign/using/g

For patch 0002:

I think Yura's explanation from above in this thread need to get place
in a commit message and in a comment to this:
> int attempts = 2;

Comments around:
"try another lock next time" could be modified to reflect that we do
repeat twice

Kind regards,
Pavel Borisov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-02-13 16:44:33 Re: [PATCH] Optionally record Plan IDs to track plan changes for a query
Previous Message Tomas Vondra 2025-02-13 16:28:53 Re: BitmapHeapScan streaming read user and prelim refactoring