From: | Joerg Sonnenberger <joerg(at)bec(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Joerg Sonnenberger <joerg(at)bec(dot)de>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Date: | 2019-01-09 20:35:20 |
Message-ID: | 20190109203520.GA4744@britannica.bec.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 09, 2019 at 02:04:15PM -0500, Tom Lane wrote:
> Also, in view of finding that the original multiplier choices failed
> on the fmgr oid problem, I spent a little effort making the code
> able to try more combinations of hash multipliers and seeds. It'd
> be nice to have some theory rather than just heuristics about what
> will work, though ...
The theory is that the code needs two families of pair-wise independent
hash functions to give the O(1) number of tries. For practical purposes,
the Jenkins hash easily qualifies or any cryptographic hash function.
The downside is that those hash functions are very expensive though.
A multiplicative hash, especially using the high bits of the results
(e.g. the top 32bit of a 32x32->64 multiplication) qualifies for the
requirements, but for strings of input it would need a pair of constant
per word. So the choice of a hash function family for a performance
sensitive part is a heuristic in the sense of trying to get away with as
simple a function as possible.
Joerg
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Verite | 2019-01-09 21:01:36 | Re: insensitive collations |
Previous Message | Tom Lane | 2019-01-09 20:32:37 | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |