Re: Why hash join instead of nested loop?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rhett Garber <rhettg(at)gmail(dot)com>
Cc: Havasvölgyi Ottó <h(dot)otto(at)freemail(dot)hu>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Why hash join instead of nested loop?
Date: 2005-08-09 17:12:59
Message-ID: 17253.1123607579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Rhett Garber <rhettg(at)gmail(dot)com> writes:
> Duplicated your setup in a separate DB.
> At least its reproducable for me.....

Hmm. Well, we now have several data points but they seem to be on
wildly varying hardware. To try to normalize the results a little,
I computed the total actual time for the hash plan divided by the sum
of the actual times for the two scan nodes. Thus, for your example:

> Hash Join (cost=4.83..5.91 rows=1 width=14) (actual time=7.148..7.159
> rows=1 loops=1)
> Hash Cond: ("outer".id = "inner".obj2)
> -> Seq Scan on rtmessagestate (cost=0.00..1.05 rows=5 width=14)
> (actual time=0.007..0.015 rows=5 loops=1)
> -> Hash (cost=4.83..4.83 rows=1 width=4) (actual
> time=0.055..0.055 rows=0 loops=1)
> -> Index Scan using connection_regid_obj1_index on
> connection (cost=0.00..4.83 rows=1 width=4) (actual time=0.028..0.032
> rows=1 loops=1)
> Index Cond: ((connection_registry_id = 40105) AND (obj1
> = 73582)) Total runtime: 7.693 ms
> (7 rows)

this would be 7.159 / (0.015 + 0.032). This is probably not an
enormously robust statistic but it at least focuses attention in the
right place. Here's what I get (rounded off to 4 digits which is surely
as much precision as we have in the numbers):

Tom 7.4.8+ 1.619
Ian 7.4.8 6.000
Ian 7.4.2 13.95
Steinar 7.4.7 8.833
Rhett orig 108.3
Rhett test 152.3
Michael 7.4.1 2.015

My number seems to be a bit of an outlier to the low side, but yours are
way the heck to the high side. And Michael's test seems to rule out the
idea that it's something broken in 7.4.1 in particular.

I'm now thinking you've got either a platform- or compiler-specific
problem. Exactly what is the hardware (the CPU not the disks)? How did
you build or come by the Postgres executables (compiler, configure
options, etc)?

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Harris 2005-08-09 18:04:11 Table locking problems?
Previous Message Rhett Garber 2005-08-09 16:51:51 Re: Why hash join instead of nested loop?