Quick Links

Division in dynahash.c due to HASH_FFACTOR

From:	Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Division in dynahash.c due to HASH_FFACTOR
Date:	2020-09-04 07:01:41
Message-ID:	VI1PR0701MB696044FC35013A96FECC7AC8F62D0@VI1PR0701MB6960.eurprd07.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetins hackers,

I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I would like to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill factor is always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). Because the functionality is present there seems to be division for every buffer access [BufTableLookup()] / or every smgropen() call (everything call to hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). This division is especially visible via perf on single process StartupXLOG WAL recovery process on standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search:
0x0000000000888751 <+449>: idiv r8
0x0000000000888754 <+452>: cmp rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above

I've made a PoC test to skip that division assuming ffactor would be gone:
if (!IS_PARTITIONED(hctl) && !hashp->frozen &&
- hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor &&
+ hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) &&

For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings):
gcc -O3: 104->100s
gcc -O2: 108->104s
pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there

After removing HASH_FFACTOR PostgreSQL still compiles... Would removing it break some external API/extensions ? I saw several optimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide Or maybe there is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty often I have no idea what kind of pgbench stresstest could be used to demonstrate the gain (or lack of it).

-Jakub Wartak.

Responses

Re: Division in dynahash.c due to HASH_FFACTOR at 2020-09-04 12:04:36 from Tomas Vondra
Re: Division in dynahash.c due to HASH_FFACTOR at 2020-09-04 14:34:39 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2020-09-04 08:12:51	Re: POC: rational number type (fractions)
Previous Message	Craig Ringer	2020-09-04 06:55:01	Re: [PATCH] Detect escape of ErrorContextCallback stack pointers (and from PG_TRY() )