Re: BufferAlloc: don't take two simultaneous locks

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, michail(dot)nikolaev(at)gmail(dot)com, x4mmm(at)yandex-team(dot)ru, andres(at)anarazel(dot)de, simon(dot)riggs(at)enterprisedb(dot)com
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-03-20 09:38:06
Message-ID: c6f11bacdab7f7ef0891dec0823759807038a91f.camel@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В Чт, 17/03/2022 в 12:02 +0900, Kyotaro Horiguchi пишет:
> At Wed, 16 Mar 2022 14:11:58 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> > В Ср, 16/03/2022 в 12:07 +0900, Kyotaro Horiguchi пишет:
> > > At Tue, 15 Mar 2022 13:47:17 +0300, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote in
> > > In v7, HASH_ENTER returns the element stored in DynaHashReuse using
> > > the freelist_idx of the new key. v8 uses that of the old key (at the
> > > time of HASH_REUSE). So in the case "REUSE->ENTER(elem exists and
> > > returns the stashed)" case the stashed element is returned to its
> > > original partition. But it is not what I mentioned.
> > >
> > > On the other hand, once the stahsed element is reused by HASH_ENTER,
> > > it gives the same resulting state with HASH_REMOVE->HASH_ENTER(borrow
> > > from old partition) case. I suspect that ththat the frequent freelist
> > > starvation comes from the latter case.
> >
> > Doubtfully. Due to probabilty theory, single partition doubdfully
> > will be too overflowed. Therefore, freelist.
>
> Yeah. I think so generally.
>
> > But! With 128kb shared buffers there is just 32 buffers. 32 entry for
> > 32 freelist partition - certainly some freelist partition will certainly
> > have 0 entry even if all entries are in freelists.
>
> Anyway, it's an extreme condition and the starvation happens only at a
> neglegible ratio.
>
> > > RETURNED: 2
> > > ALLOCED: 0
> > > BORROWED: 435
> > > REUSED: 495444
> > > ASSIGNED: 495467 (-23)
> > >
> > > Now "BORROWED" happens 0.8% of REUSED
> >
> > 0.08% actually :)
>
> Mmm. Doesn't matter:p
>
> > > > > > I lost access to Xeon 8354H, so returned to old Xeon X5675.
> > > > > ...
> > > > > > Strange thing: both master and patched version has higher
> > > > > > peak tps at X5676 at medium connections (17 or 27 clients)
> > > > > > than in first october version [1]. But lower tps at higher
> > > > > > connections number (>= 191 clients).
> > > > > > I'll try to bisect on master this unfortunate change.
> ...
> > I've checked. Looks like something had changed on the server, since
> > old master commit behaves now same to new one (and differently to
> > how it behaved in October).
> > I remember maintainance downtime of the server in november/december.
> > Probably, kernel were upgraded or some system settings were changed.
>
> One thing I have a little concern is that numbers shows 1-2% of
> degradation steadily for connection numbers < 17.
>
> I think there are two possible cause of the degradation.
>
> 1. Additional branch by consolidating HASH_ASSIGN into HASH_ENTER.
> This might cause degradation for memory-contended use.
>
> 2. nallocs operation might cause degradation on non-shared dynahasyes?
> I believe doesn't but I'm not sure.
>
> On a simple benchmarking with pgbench on a laptop, dynahash
> allocation (including shared and non-shared) happend about at 50
> times per second with 10 processes and 200 with 100 processes.
>
> > > I don't think nalloced needs to be the same width to long. For the
> > > platforms with 32-bit long, anyway the possible degradation if any by
> > > 64-bit atomic there doesn't matter. So don't we always define the
> > > atomic as 64bit and use the pg_atomic_* functions directly?
> >
> > Some 32bit platforms has no native 64bit atomics. Then they are
> > emulated with locks.
> >
> > Well, and for 32bit platform long is just enough. Why spend other
> > 4 bytes per each dynahash?
>
> I don't think additional bytes doesn't matter, but emulated atomic
> operations can matter. However I'm not sure which platform uses that
> fallback implementations. (x86 seems to have __sync_fetch_and_add()
> since P4).
>
> My opinion in the previous mail is that if that level of degradation
> caued by emulated atomic operations matters, we shouldn't use atomic
> there at all since atomic operations on the modern platforms are not
> also free.
>
> In relation to 2 above, if we observe that the degradation disappears
> by (tentatively) use non-atomic operations for nalloced, we should go
> back to the previous per-freelist nalloced.

Here is version with nalloced being union of appropriate atomic and
long.

------

regards
Yura Sokolov

Attachment Content-Type Size
v9-bufmgr-lock-improvements.patch text/x-patch 32.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-03-20 10:03:38 Re: pgsql: Add option to use ICU as global locale provider
Previous Message Thomas Munro 2022-03-20 09:36:26 Re: A test for replay of regression tests