Quick Links

BUG #2930: Hash join abyssmal with many null fields.

From:	"Maciej Babinski" <maciej+postgres(at)apathy(dot)killer-robot(dot)net>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	BUG #2930: Hash join abyssmal with many null fields.
Date:	2007-01-24 19:14:50
Message-ID:	200701241914.l0OJEou1005770@wwwmaster.postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

The following bug has been logged online:

Bug reference: 2930
Logged by: Maciej Babinski
Email address: maciej+postgres(at)killer-robot(dot)net
PostgreSQL version: 8.2
Operating system: CentOS 4.4 (linux 2.6.9)
Description: Hash join abyssmal with many null fields.
Details:

Hash join of columns with many null fields is very slow unless the null
fields are commented out. Steps to reproduce:

CREATE TABLE a (id integer, join_id integer);
CREATE TABLE b (id integer, join_id integer);

INSERT INTO a (id) SELECT generate_series(1,10000);
INSERT INTO b (id) SELECT generate_series(1,10000);

ANALYZE a;
ANALYZE b;

EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id;
EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL;

Here are the results of the two explains:

test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id =
b.join_id;
EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id AND
b.join_id IS NOT NULL;
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------------
Hash Left Join (cost=170.00..1590.01 rows=10000 width=16) (actual
time=23.100..9451.778 rows=10000 loops=1)
Hash Cond: (a.join_id = b.join_id)
-> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.014..11.071 rows=10000 loops=1)
-> Hash (cost=145.00..145.00 rows=10000 width=8) (actual
time=21.999..21.999 rows=10000 loops=1)
-> Seq Scan on b (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.011..10.679 rows=10000 loops=1)
Total runtime: 9460.944 ms
(6 rows)

test1=# EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.join_id = b.join_id
AND b.join_id IS NOT NULL;
QUERY PLAN
----------------------------------------------------------------------------
--------------------------------
Hash Left Join (cost=145.00..340.01 rows=10000 width=16) (actual
time=1.970..32.452 rows=10000 loops=1)
Hash Cond: (a.join_id = b.join_id)
-> Seq Scan on a (cost=0.00..145.00 rows=10000 width=8) (actual
time=0.033..10.326 rows=10000 loops=1)
-> Hash (cost=145.00..145.00 rows=1 width=8) (actual time=1.924..1.924
rows=0 loops=1)
-> Seq Scan on b (cost=0.00..145.00 rows=1 width=8) (actual
time=1.922..1.922 rows=0 loops=1)
Filter: (join_id IS NOT NULL)
Total runtime: 41.495 ms
(7 rows)

Note that the two ON clauses are logically identical, but the second query
is much faster than the first. The first query will slow down exponentially
with the number of rows: 200,000 rows yielded a runtime of over ten hours.
Thanks!

Responses

Re: BUG #2930: Hash join abyssmal with many null fields. at 2007-01-26 05:49:52 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Feng Chen	2007-01-24 19:25:12	Postgres Stored Procedure Call Function Return Type OID Caching Problem
Previous Message	Tom Lane	2007-01-24 19:04:23	Re: Function returns wrong data after datatype change