Re: Get rid of WALBufMappingLock

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com>
Subject: Re: Get rid of WALBufMappingLock
Date: 2025-02-13 09:34:38
Message-ID: CAPpHfdtr0Gj0K88yBjhk8Rq-zT6WFSxC+eSCmMmywNc4pgr=rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> 08.02.2025 13:07, Alexander Korotkov пишет:
> > On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >> Good, thank you. I think 0001 patch is generally good, but needs some
> >> further polishing, e.g. more comments explaining how does it work.
>
> I tried to add more comments. I'm not good at, so recommendations are welcome.
>
> > Two things comes to my mind worth rechecking about 0001.
> > 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> > XLogCtl->xlblocks always page-aligned? Because algorithm seems to be
> > sensitive to that. If so, I would propose to explicitly comment that
> > and add corresponding asserts.
>
> They're certainly page aligned, since they are page borders.
> I added assert on alignment of InitializeReserved for the sanity.
>
> > 2) Check if there are concurrency issues between
> > AdvanceXLInsertBuffer() and switching to the new WAL file.
>
> There are no issues:
> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> GetXLogBuffer to zero-out WAL page.
> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion locks,
> so switching wal is not concurrent. (Although, there is no need in this
> exclusiveness, imho.)

Good, thank you. I've also revised commit message and comments.

But I see another issue with this patch. In the worst case, we do
XLogWrite() by ourselves, and it could potentially could error out.
Without patch, that would cause WALBufMappingLock be released and
XLogCtl->InitializedUpTo not advanced. With the patch, that would
cause other processes infinitely waiting till we finish the
initialization.

Possible solution would be to save position of the page to be
initialized, and set it back to XLogCtl->InitializeReserved on error
(everywhere we do LWLockReleaseAll()). We also must check that on
error we only set XLogCtl->InitializeReserved to the past, because
there could be multiple concurrent failures. Also we need to
broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.

> > Regarding 0002 patch, it looks generally reasonable. But are 2
> > attempts always optimal? Are there cases of regression, or cases when
> > more attempts are even better? Could we have there some
> > self-adjusting mechanism like what we have for spinlocks?
>
> Well, I chose to perform 3 probes (2 conditional attempts + 1
> unconditional) based on intuition. I have some experience in building hash
> tables, and cuckoo-hashing theory tells 2 probes is usually enough to reach
> 50% fill-rate, and 3 probes enough for ~75% fill rate. Since each probe is
> cache miss, it is hardly sensible to do more probes.
>
> 3 probes did better than 2 in other benchmark [1], although there were
> NUM_XLOGINSERT_LOCK increased.
>
> Excuse me for not bencmarking different choices here. I'll try to do
> measurements in next days.
>
> [1] https://postgr.es/m/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru

Ok, let's wait for your measurements.

------
Regards,
Alexander Korotkov
Supabase

Attachment Content-Type Size
v5-0002-several-attempts-to-lock-WALInsertLocks.patch application/octet-stream 3.0 KB
v5-0001-Get-rid-of-WALBufMappingLock.patch application/octet-stream 16.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2025-02-13 09:45:26 Re: Get rid of WALBufMappingLock
Previous Message Ilia Evdokimov 2025-02-13 09:05:55 Re: explain analyze rows=%.0f