From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A better way than tweaking NTUP_PER_BUCKET |
Date: | 2013-06-22 13:48:12 |
Message-ID: | 20130622134812.GG7093@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon,
* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> Previous discussions of Hash Joins have noted that the performance
> decreases when the average number of tuples per bucket increases.
Having the hash table so small that we have hash bucket collisions with
different 32bit hash values is extremely painful, however..
> The correct calculation that would match the objective set out in the
> comment would be
>
> dbuckets = (hash_table_bytes / tupsize) / NTUP_PER_BUCKET;
This looks to be driving the size of the hash table size off of "how
many of this size tuple can I fit into memory?" and ignoring how many
actual rows we have to hash. Consider a work_mem of 1GB with a small
number of rows to actually hash- say 50. With a tupsize of 8 bytes,
we'd be creating a hash table sized for some 13M buckets.
> This solves the problem in earlier discussions since we get a lower
> average number of tuples per bucket and yet we also get to keep the
> current NTUP_PER_BUCKET value. Everybody wins.
I agree that this should reduce the number of tuples per bucket, but I
don't think it's doing what you expect and based on what it's doing, it
seems wasteful.
> Comments?
Based on your argument that we want to have a bucket load which is, on
average, the size of NTUP_PER_BUCKET, I have to '-1' on this. What we
want is to size a table large enough that we never have any true
collisions (cases where different 32-bit hash values end up in the same
bucket) *for all tuples being hashed*, that includes both the side
building the hash table and the side doing the scan. This should be
done when we can avoid batching- if we don't have enough to avoid
batching while having such a large table, we should consider adjusting
the hash table size down some because, in those cases, we're memory
constrained.
When we have enough memory to avoid batching, we never want to have to
check down through a bucket for a tuple which doesn't exist- we should
simply be able to look in the hash table itself, determine (with a
single fetch) that there are no entries for that hash value, and throw
the tuple away and move on to the next.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2013-06-22 13:56:30 | Re: Unaccent performance |
Previous Message | Andres Freund | 2013-06-22 13:48:10 | Re: Support for REINDEX CONCURRENTLY |