From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | tweaking perfect hash multipliers |
Date: | 2020-03-30 13:33:14 |
Message-ID: | CACPNZCuVTiLhxAzXp9uCeHGUyHMa59h6_pmP+_W-SzXG0UyY9w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
While playing around with Peter E.'s unicode normalization patch [1],
I found that HEAD failed to build a perfect hash function for any of
the four sets of 4-byte keys ranging from 1k to 17k in number. It
probably doesn't help that codepoints have nul bytes and often cluster
into consecutive ranges. In addition, I found that a couple of the
candidate hash multipliers don't compile to shift-and-add
instructions, although they were chosen with that intent in mind. It
seems compilers will only do that if the number is exactly 2^n +/- 1.
Using the latest gcc and clang, I tested all prime numbers up to 5000
(plus 8191 for good measure), and found a handful that are compiled
into non-imul instructions. Dialing back the version, gcc 4.8 and
clang 7.0 are the earliest I found that have the same behavior as
newer ones. For reference:
https://gcc.godbolt.org/z/bxcXHu
In addition to shift-and-add, there are also a few using lea,
lea-and-add, or 2 leas.
Then I used the attached program to measure various combinations of
compiled instructions using two constant multipliers iterating over
bytes similar to a generated hash function.
<cc> -O2 -Wall test-const-mult.c test-const-mult-2.c
./a.out
Median of 3 with clang 10:
lea, lea 0.181s
lea, lea+add 0.248s
lea, shift+add 0.251s
lea+add, shift+add 0.273s
shift+add, shift+add 0.276s
2 leas, 2 leas 0.290s
shift+add, imul 0.329s
Taking this with a grain of salt, it nonetheless seems plausible that
a single lea could be faster than any two instructions here. The only
primes that compile to a single lea are 3 and 5, but I've found those
multipliers can build hash functions for all our keyword lists, as
demonstration. None of the others we didn't have already are
particularly interesting from a performance point of view.
With the unicode quick check, I found that the larger sets need (257,
8191) as multipliers to build the hash table, and none of the smaller
special primes I tested will work.
Keeping these two properties in mind, I came up with the scheme in the
attached patch that tries adjacent pairs in this array:
(3, 5, 17, 31, 127, 257, 8191)
so that we try (3,5) first, next (5,17), and then all the pure
shift-and-adds with (257,8191) last.
The main motivation is to be able to build the unicode quick check
tables, but if we ever use this functionality in a hot code path, we
may as well try to shave a few more cycles while we're at it.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
tweak-perfect-hash-gen.patch | application/x-patch | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2020-03-30 13:35:33 | Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line |
Previous Message | Ahsan Hadi | 2020-03-30 12:58:18 | Re: WIP/PoC for parallel backup |