Hi,
Thinking about this a bit more, do we really need to build the hash
table on the first pass? Why not to do this:
(1) batching
- read the tuples, stuff them into a simple list
- don't build the hash table yet
(2) building the hash table
- we have all the tuples in a simple list, batching is done
- we know exact row count, can size the table properly
- build the table
Also, maybe we could use a regular linear hash table [1], instead of
using the current implementation with NTUP_PER_BUCKET=1. (Although,
that'd be absolutely awful with duplicates.)
regards
Tomas
[1] http://en.wikipedia.org/wiki/Linear_probing