From: | "Timothy J(dot) Kordas" <tkordas(at)greenplum(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | hash join hashtable size and work_mem |
Date: | 2007-03-14 16:32:46 |
Message-ID: | 45F823AE.6040900@greenplum.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
in nodeHash.c, the function ExecChooseHashTableSize() uses two different
methods for determining the number of buckets to use.
the current code looks something like:
if (ntuples * tuplesize > work_mem * 1024)
buckets = (work_mem * 1024) / (tupsize * 10);
else
buckets = ntuples/10
So for the case where a spill is expected; we use work_mem to decide on our
hash size. For the case where a spill isn't expected; we rely on the row
estimate alone -- and make no provision for speeding the join by using the
memory that we're allowed to use.
When profiling large hash-joins, it often is the case that scanning the
hash-buckets is a bottleneck; it would be nice for the user to be able to
"throw memory" at a join to improve performance.
Am I missing something about the current implementation ? I would expect
that the bucket count would be calculated something like:
buckets = (work_mem * 1024L) / (tup_size * NTUP_PER_BUCKET)
for both cases ?
making this change appears to improve hash-join performance substantially in
some cases, and as far as I can tell doesn't hurt anything (apart from using
memory that it is "allowed" to use given a particular work_mem setting).
-Tim
--
tkordas(at)greenplum(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Joshua D. Drake | 2007-03-14 16:43:38 | Re: [PATCHES] Bitmapscan changes |
Previous Message | Alvaro Herrera | 2007-03-14 16:26:44 | Re: [PATCHES] Bitmapscan changes |