From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Use simplehash.h instead of dynahash in SMgr |
Date: | 2021-04-25 13:36:21 |
Message-ID: | CAApHDvoDpMifmDM2YcrUc=u9gYFNwjGQww6jXT=giGYK_5EP+A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 25 Apr 2021 at 18:48, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
> If you read comments in SH_START_ITERATE, you'll see:
>
> * Search for the first empty element. As deletions during iterations
> are
> * supported, we want to start/end at an element that cannot be
> affected
> * by elements being shifted.
>
> * Iterate backwards, that allows the current element to be deleted,
> even
> * if there are backward shifts
>
> Therefore, it is safe to delete during iteration, and it doesn't lead
> nor
> require loop restart.
I had only skimmed that with a pre-loaded assumption that it wouldn't
be safe. I didn't do a very good job of reading it as I failed to
notice the lack of guarantees were about deleting items other than the
current one. I didn't consider the option of finding a free bucket
then looping backwards to avoid missing entries that are moved up
during a delete.
With that, I changed the patch to get rid of the SH_TRUNCATE and got
rid of the smgrclose_internal which skips the hash delete. The code
is much more similar to how it was now.
In regards to the hashing stuff. I added a new function to
pg_bitutils.h to rotate left and I'm using that instead of the other
expression that was taken from nodeHash.c
For the hash function, I've done some further benchmarking with:
1) The attached v2 patch
2) The attached + plus use_hash_combine.patch.txt which uses
hash_combine() instead of pg_rotate_left32()ing the hashkey each time.
3) The attached v2 with use_hash_bytes.patch.txt applied.
4) Master
I've also included the hash stats from each version of the hash function.
I hope the numbers help indicate the reason I picked the hash function
that I did.
1) v2 patch.
CPU: user: 108.23 s, system: 6.97 s, elapsed: 115.63 s
CPU: user: 114.78 s, system: 6.88 s, elapsed: 121.71 s
CPU: user: 107.53 s, system: 5.70 s, elapsed: 113.28 s
CPU: user: 108.43 s, system: 5.73 s, elapsed: 114.22 s
CPU: user: 106.18 s, system: 5.73 s, elapsed: 111.96 s
CPU: user: 108.04 s, system: 5.29 s, elapsed: 113.39 s
CPU: user: 107.64 s, system: 5.64 s, elapsed: 113.34 s
CPU: user: 106.64 s, system: 5.58 s, elapsed: 112.27 s
CPU: user: 107.91 s, system: 5.40 s, elapsed: 113.36 s
CPU: user: 115.35 s, system: 6.60 s, elapsed: 122.01 s
Median = 113.375 s
LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 997,
max chain: 5, avg chain: 0.490650, total_collisions: 422,
max_collisions: 3, avg_collisions: 0.207677
2) v2 patch + use_hash_combine.patch.txt
CPU: user: 113.22 s, system: 5.52 s, elapsed: 118.80 s
CPU: user: 116.63 s, system: 5.87 s, elapsed: 122.56 s
CPU: user: 115.33 s, system: 5.73 s, elapsed: 121.12 s
CPU: user: 113.11 s, system: 5.61 s, elapsed: 118.78 s
CPU: user: 112.56 s, system: 5.51 s, elapsed: 118.13 s
CPU: user: 114.55 s, system: 5.80 s, elapsed: 120.40 s
CPU: user: 121.79 s, system: 6.45 s, elapsed: 128.29 s
CPU: user: 113.98 s, system: 4.50 s, elapsed: 118.52 s
CPU: user: 113.24 s, system: 5.63 s, elapsed: 118.93 s
CPU: user: 114.11 s, system: 5.60 s, elapsed: 119.78 s
Median = 119.355 s
LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 971,
max chain: 6, avg chain: 0.477854, total_collisions: 433,
max_collisions: 4, avg_collisions: 0.213091
3) v2 patch + use_hash_bytes.patch.txt
CPU: user: 120.87 s, system: 6.69 s, elapsed: 127.62 s
CPU: user: 112.40 s, system: 4.68 s, elapsed: 117.14 s
CPU: user: 113.19 s, system: 5.44 s, elapsed: 118.69 s
CPU: user: 112.15 s, system: 4.73 s, elapsed: 116.93 s
CPU: user: 111.10 s, system: 5.59 s, elapsed: 116.74 s
CPU: user: 112.03 s, system: 5.74 s, elapsed: 117.82 s
CPU: user: 113.69 s, system: 4.33 s, elapsed: 118.07 s
CPU: user: 113.30 s, system: 4.19 s, elapsed: 117.55 s
CPU: user: 112.77 s, system: 5.57 s, elapsed: 118.39 s
CPU: user: 112.25 s, system: 4.59 s, elapsed: 116.88 s
Median = 117.685 s
LOG: size: 4096, members: 2032, filled: 0.496094, total chain: 900,
max chain: 4, avg chain: 0.442913, total_collisions: 415,
max_collisions: 4, avg_collisions: 0.204232
4) master
CPU: user: 117.89 s, system: 5.7 s, elapsed: 123.65 s
CPU: user: 117.81 s, system: 5.74 s, elapsed: 123.62 s
CPU: user: 119.39 s, system: 5.75 s, elapsed: 125.2 s
CPU: user: 117.98 s, system: 4.39 s, elapsed: 122.41 s
CPU: user: 117.92 s, system: 4.79 s, elapsed: 122.76 s
CPU: user: 119.84 s, system: 4.75 s, elapsed: 124.64 s
CPU: user: 120.6 s, system: 5.82 s, elapsed: 126.49 s
CPU: user: 118.74 s, system: 5.71 s, elapsed: 124.51 s
CPU: user: 124.29 s, system: 6.79 s, elapsed: 131.14 s
CPU: user: 118.73 s, system: 5.67 s, elapsed: 124.47 s
Median = 124.49 s
You can see that the bare v2 patch is quite a bit faster than any of
the alternatives. We'd be better of with hash_bytes than using
hash_combine() both for performance and for the seemingly better job
the hash function does at reducing the hash collisions.
David
Attachment | Content-Type | Size |
---|---|---|
use_hash_bytes.patch.txt | text/plain | 2.1 KB |
v2-0001-Use-simplehash.h-hashtables-in-SMgr.patch | application/octet-stream | 10.6 KB |
use_hash_combine.patch.txt | text/plain | 1.0 KB |
recovery_panic.patch.txt | text/plain | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nitin Jadhav | 2021-04-25 14:19:47 | Re: [PATCH] Automatic HASH and LIST partition creation |
Previous Message | Andrey Borodin | 2021-04-25 11:10:35 | Re: Allowing to create LEAKPROOF functions to non-superuser |