| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Bruno Wolff III <bruno(at)wolff(dot)to> |
| Cc: | pgsql-hackers(at)postgreSQL(dot)org |
| Subject: | Re: Solving hash table overrun problems |
| Date: | 2005-03-04 15:42:08 |
| Message-ID: | 27706.1109950928@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Bruno Wolff III <bruno(at)wolff(dot)to> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * Estimate the number of batches N using the planner's estimate.
>> We will always choose N a power of 2. A tuple's batch number is
>> ((H div K) mod N).
> If K is way low this could be very slow.
How so? You're not concerned about the time to do the division itself
are you?
>> * Now begin scanning the outer join input. Tuples of batch number
>> zero (according to the current calculation) can be matched to the
>> current hashtable contents. Tuples of higher batch numbers are dropped
>> into holding files for the outer input, one per batch.
> For new keys at this step do you know their final disposition so that making
> new hash entries won't be necessary?
Well, we probably have a pretty fair idea of N at this point, so it'll
usually be right --- but we reserve the right to increase N again later
in case we have to do so because one of the later inner batches is much
bigger than the zero batch we are currently processing.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruno Wolff III | 2005-03-04 15:46:07 | Re: Solving hash table overrun problems |
| Previous Message | Tom Lane | 2005-03-04 15:28:24 | Re: bitmap AM design |