Re: BufferAlloc: don't take two simultaneous locks

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, michail(dot)nikolaev(at)gmail(dot)com, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: BufferAlloc: don't take two simultaneous locks
Date: 2022-03-01 07:24:22
Message-ID: a033d901ed0120ab5bf7562d03a5bea3e33dacb6.camel@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

В Пт, 25/02/2022 в 09:38 +0000, Simon Riggs пишет:
> On Fri, 25 Feb 2022 at 09:24, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>
> > > This approach is cleaner than v1, but should also perform better
> > > because there will be a 1:1 relationship between a buffer and its
> > > dynahash entry, most of the time.
> >
> > Thank you for suggestion. Yes, it is much clearer than my initial proposal.
> >
> > Should I incorporate it to v4 patch? Perhaps, it could be a separate
> > commit in new version.
>
> I don't insist that you do that, but since the API changes are a few
> hours work ISTM better to include in one patch for combined perf
> testing. It would be better to put all changes in this area into PG15
> than to split it across multiple releases.
>
> > Why there is need for this? Which way backend could be forced to abort
> > between BufTableReuse and BufTableAssign in this code path? I don't
> > see any CHECK_FOR_INTERRUPTS on the way, but may be I'm missing
> > something.
>
> Sounds reasonable.

Ok, here is v4.
It is with two commits: one for BufferAlloc locking change and other
for dynahash's freelist avoiding.

Buffer locking patch is same to v2 with some comment changes. Ie it uses
Lock+UnlockBufHdr

For dynahash HASH_REUSE and HASH_ASSIGN as suggested.
HASH_REUSE stores deleted element into per-process static variable.
HASH_ASSIGN uses this element instead of freelist. If there's no
such stored element, it falls back to HASH_ENTER.

I've implemented Robert Haas's suggestion to count element in freelists
instead of nentries:

> One idea is to jigger things so that we maintain a count of the total
> number of entries that doesn't change except when we allocate, and
> then for each freelist partition we maintain the number of entries in
> that freelist partition. So then the size of the hash table, instead
> of being sum(nentries) is totalsize - sum(nfree).

https://postgr.es/m/CA%2BTgmoZkg-04rcNRURt%3DjAG0Cs5oPyB-qKxH4wqX09e-oXy-nw%40mail.gmail.com

It helps to avoid freelist lock just to actualize counters.
I made it with replacing "nentries" with "nfree" and adding
"nalloced" to each freelist. It also makes "hash_update_hash_key" valid
for key that migrates partitions.

I believe, there is no need for "nalloced" for each freelist, and
instead single such field should be in HASHHDR. More, it seems to me
`element_alloc` function needs no acquiring freelist partition lock
since it is called only during initialization of shared hash table.
Am I right?

I didn't go this path in v4 for simplicity, but can put it to v5
if approved.

To be honest, "reuse" patch gives little improvement. But still
measurable on some connection numbers.

I tried to reduce freelist partitions to 8, but it has mixed impact.
Most of time performance is same, but sometimes a bit lower. I
didn't investigate reasons. Perhaps they are not related to buffer
manager.

I didn't introduce new functions BufTableReuse and BufTableAssign
since there are single call to BufTableInsert and two calls to
BufTableDelete. So I reused this functions, just added "reuse" flag
to BufTableDelete.

Tests simple_select for Xeon 8354H, 128MB and 1G shared buffers
for scale 100.

1 socket:
conns | master | patch_v4 | master 1G | patch_v4 1G
--------+------------+------------+------------+------------
1 | 41975 | 41540 | 52898 | 52213
2 | 77693 | 77908 | 97571 | 98371
3 | 114713 | 115522 | 142709 | 145226
5 | 188898 | 187617 | 239322 | 237269
7 | 261516 | 260006 | 329119 | 329449
17 | 521821 | 519473 | 672390 | 662106
27 | 555487 | 555697 | 674630 | 672736
53 | 868213 | 896539 | 1190734 | 1202505
83 | 868232 | 866029 | 1164997 | 1158719
107 | 850477 | 845685 | 1140597 | 1134502
139 | 816311 | 816808 | 1101471 | 1091258
163 | 794788 | 796517 | 1078445 | 1071568
191 | 765934 | 776185 | 1059497 | 1041944
211 | 738656 | 777365 | 1083356 | 1046422
239 | 713124 | 841337 | 1104629 | 1116668
271 | 692138 | 847803 | 1094432 | 1128971
307 | 682919 | 849239 | 1086306 | 1127051
353 | 679449 | 842125 | 1071482 | 1117471
397 | 676217 | 844015 | 1058937 | 1118628

2 sockets:
conns | master | patch_v4 | master 1G | patch_v4 1G
--------+------------+------------+------------+------------
1 | 44317 | 44034 | 53920 | 53583
2 | 81193 | 78621 | 99138 | 97968
3 | 120755 | 115648 | 148102 | 147423
5 | 190007 | 188943 | 232078 | 231029
7 | 258602 | 260649 | 325545 | 318567
17 | 551814 | 552914 | 692312 | 697518
27 | 787353 | 786573 | 1023509 | 1022891
53 | 973880 | 1008534 | 1228274 | 1278194
83 | 1108442 | 1269777 | 1596292 | 1648156
107 | 1072188 | 1339634 | 1542401 | 1664476
139 | 1000446 | 1316372 | 1490757 | 1676127
163 | 967378 | 1257445 | 1461468 | 1655574
191 | 926010 | 1189591 | 1435317 | 1639313
211 | 909919 | 1149905 | 1417437 | 1632764
239 | 895944 | 1115681 | 1393530 | 1616329
271 | 880545 | 1090208 | 1374878 | 1609544
307 | 865560 | 1066798 | 1355164 | 1593769
353 | 857591 | 1046426 | 1330069 | 1584006
397 | 840374 | 1024711 | 1312257 | 1564872

--------

regards

Yura Sokolov
Postgres Professional
y(dot)sokolov(at)postgrespro(dot)ru
funny(dot)falcon(at)gmail(dot)com

Attachment Content-Type Size
v4-0002-PGPRO-5616-Add-HASH_REUSE-HASH_ASSIGN-and-use-it-.patch text/x-patch 14.0 KB
v4-0001-bufmgr-do-not-acquire-two-partition-lo.patch text/x-patch 8.5 KB
image/gif 13.4 KB
image/gif 13.9 KB
res.zip application/zip 47.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-01 07:34:31 Re: Allow async standbys wait for sync replication
Previous Message Peter Smith 2022-03-01 07:12:25 Re: Failed transaction statistics to measure the logical replication progress