| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
| Cc: | "pgsql-hackers" <pgsql-hackers(at)postgreSQL(dot)org> |
| Subject: | Re: [HACKERS] Hash Join is very slooow in some cases |
| Date: | 1999-11-19 03:51:20 |
| Message-ID: | 789.942983480@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
"Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> select count(*) from a,b where a.id1=b.id1;
> returns immeidaitely ...
> But
> select count(*) from a,b where a.id1=b.id1 and a.id2=b.id2;
> takes very looong time.
> I examined an output by EXPLAIN VERBOSE and found that
> the 1st query uses id1 as its hashkey and 2nd query uses id2
> as its hashkey.
Yes, and since id2 has terrible disbursion, most of the hashtable
entries end up in a small number of hash buckets, resulting in
an unexpectedly large number of comparisons done for each outer
tuple. I've seen this effect before.
I have a TODO item to make the optimizer pay attention to disbursion
when estimating the cost of a hashjoin. That would cause it to make
the right choice of key in this example. Not done yet though :-(.
Feel free to jump in if you need it today...
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 1999-11-19 04:06:58 | Re: [HACKERS] pg version date file |
| Previous Message | Tom Lane | 1999-11-19 03:45:14 | Re: [HACKERS] Getting OID in psql of recent insert |