From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, "Zhou, Zhiguo" <zhiguo(dot)zhou(at)intel(dot)com> |
Subject: | Re: Get rid of WALBufMappingLock |
Date: | 2025-02-13 09:34:38 |
Message-ID: | CAPpHfdtr0Gj0K88yBjhk8Rq-zT6WFSxC+eSCmMmywNc4pgr=rQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 12, 2025 at 8:16 PM Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> 08.02.2025 13:07, Alexander Korotkov пишет:
> > On Fri, Feb 7, 2025 at 1:39 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >> Good, thank you. I think 0001 patch is generally good, but needs some
> >> further polishing, e.g. more comments explaining how does it work.
>
> I tried to add more comments. I'm not good at, so recommendations are welcome.
>
> > Two things comes to my mind worth rechecking about 0001.
> > 1) Are XLogCtl->InitializeReserved, XLogCtl->InitializedUpTo and
> > XLogCtl->xlblocks always page-aligned? Because algorithm seems to be
> > sensitive to that. If so, I would propose to explicitly comment that
> > and add corresponding asserts.
>
> They're certainly page aligned, since they are page borders.
> I added assert on alignment of InitializeReserved for the sanity.
>
> > 2) Check if there are concurrency issues between
> > AdvanceXLInsertBuffer() and switching to the new WAL file.
>
> There are no issues:
> 1. CopyXLogRecordToWAL for isLogSwitch follows same protocol, ie uses
> GetXLogBuffer to zero-out WAL page.
> 2. WALINSERT_SPECIAL_SWITCH forces exclusive lock on all insertion locks,
> so switching wal is not concurrent. (Although, there is no need in this
> exclusiveness, imho.)
Good, thank you. I've also revised commit message and comments.
But I see another issue with this patch. In the worst case, we do
XLogWrite() by ourselves, and it could potentially could error out.
Without patch, that would cause WALBufMappingLock be released and
XLogCtl->InitializedUpTo not advanced. With the patch, that would
cause other processes infinitely waiting till we finish the
initialization.
Possible solution would be to save position of the page to be
initialized, and set it back to XLogCtl->InitializeReserved on error
(everywhere we do LWLockReleaseAll()). We also must check that on
error we only set XLogCtl->InitializeReserved to the past, because
there could be multiple concurrent failures. Also we need to
broadcast XLogCtl->InitializedUpToCondVar to wake up waiters.
> > Regarding 0002 patch, it looks generally reasonable. But are 2
> > attempts always optimal? Are there cases of regression, or cases when
> > more attempts are even better? Could we have there some
> > self-adjusting mechanism like what we have for spinlocks?
>
> Well, I chose to perform 3 probes (2 conditional attempts + 1
> unconditional) based on intuition. I have some experience in building hash
> tables, and cuckoo-hashing theory tells 2 probes is usually enough to reach
> 50% fill-rate, and 3 probes enough for ~75% fill rate. Since each probe is
> cache miss, it is hardly sensible to do more probes.
>
> 3 probes did better than 2 in other benchmark [1], although there were
> NUM_XLOGINSERT_LOCK increased.
>
> Excuse me for not bencmarking different choices here. I'll try to do
> measurements in next days.
>
> [1] https://postgr.es/m/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru
Ok, let's wait for your measurements.
------
Regards,
Alexander Korotkov
Supabase
Attachment | Content-Type | Size |
---|---|---|
v5-0002-several-attempts-to-lock-WALInsertLocks.patch | application/octet-stream | 3.0 KB |
v5-0001-Get-rid-of-WALBufMappingLock.patch | application/octet-stream | 16.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Yura Sokolov | 2025-02-13 09:45:26 | Re: Get rid of WALBufMappingLock |
Previous Message | Ilia Evdokimov | 2025-02-13 09:05:55 | Re: explain analyze rows=%.0f |