From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Joerg Sonnenberger <joerg(at)bec(dot)de> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Date: | 2019-01-08 00:36:58 |
Message-ID: | 10021.1546907818@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I wrote:
> Probably there's a lot to be criticized about the Perl style below;
> anybody feel a need to rewrite it?
Here's a somewhat better version. I realized that I was being too
slavishly tied to the data structures used in the C version; in Perl
it's easier to manage the lists of edges as hashes. I can't see any
need to distinguish left and right edge sets, either, so this just
has one such hash per vertex.
Also, it seems to me that we *can* make intelligent use of unused
hashtable entries to exit early on many non-keyword inputs. The reason
the existing code fails to do so is that it computes the sums and
differences of hashtable entries in unsigned modulo arithmetic; but if
we make the hashtable entries signed, we can set them up as exact
differences and drop the final modulo operation in the hash function.
Then, any out-of-range sum must indicate an input that is not a keyword
(because it is touching a pair of hashtable entries that didn't go
together) and we can exit early from the caller. This in turn lets us
mark unused hashtable entries with large values to ensure that sums
involving them will be out of range.
A weak spot in that argument is that it's not entirely clear how
large the differences can get --- with an unlucky series of
collisions, maybe they could get large enough to overflow int16?
I don't think that's likely for the size of problem this script
is going to encounter, so I just put in an error check for it.
But it could do with closer analysis before deciding that this
is a general-purpose solution.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
perfect-hash-keyword-lookup-2.patch | text/x-diff | 19.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-01-08 00:37:51 | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Previous Message | Andres Freund | 2019-01-08 00:31:59 | Re: Displaying and dumping of table access methods |