From: | Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Division in dynahash.c due to HASH_FFACTOR |
Date: | 2020-09-04 07:01:41 |
Message-ID: | VI1PR0701MB696044FC35013A96FECC7AC8F62D0@VI1PR0701MB6960.eurprd07.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetins hackers,
I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I would like to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill factor is always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). Because the functionality is present there seems to be division for every buffer access [BufTableLookup()] / or every smgropen() call (everything call to hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). This division is especially visible via perf on single process StartupXLOG WAL recovery process on standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search:
0x0000000000888751 <+449>: idiv r8
0x0000000000888754 <+452>: cmp rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above
I've made a PoC test to skip that division assuming ffactor would be gone:
if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
- hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+ hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&
For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there
After removing HASH_FFACTOR PostgreSQL still compiles... Would removing it break some external API/extensions ? I saw several optimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide Or maybe there is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty often I have no idea what kind of pgbench stresstest could be used to demonstrate the gain (or lack of it).
-Jakub Wartak.
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2020-09-04 08:12:51 | Re: POC: rational number type (fractions) |
Previous Message | Craig Ringer | 2020-09-04 06:55:01 | Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() ) |