Quick Links

Hash Join vs Nested Loops in 7.2.1 ...

From:	Ed Loehr <pggeneral(at)bluepolka(dot)net>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Hash Join vs Nested Loops in 7.2.1 ...
Date:	2002-04-09 04:09:23
Message-ID:	3CB26973.5010808@bluepolka.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I have a 7.2.1 query with two peculiar characteristics and wondered if anyone
could offer some insight.

First, my query takes 90 seconds with a hash join, but only 1 second with
nested loops.

Second, the same query sometimes takes 10-50 seconds shortly after possibly a
dump or other high-data-volume queries are executed, after which it then
returns to 1 second execution time. Getting crowded out of shared memory?

Finally, I am inclined to turn off hash joins altogether. What should I
expect to lose in doing so?

Schema, query, and plans shown below...

Thanks,
Ed

-- 700,000 rows
CREATE TABLE story (
value TEXT,
key SERIAL
);
CREATE UNIQUE INDEX story_pkey ON story USING btree (key);

-- 700,000 rows
CREATE TABLE dict (
word VARCHAR,
key SERIAL
);
CREATE UNIQUE INDEX dict_pkey ON dict USING btree (key);
CREATE UNIQUE INDEX dict_word_key ON dict USING btree (word);

-- 28,000,000 rows
CREATE TABLE story_dict (
dictkey INTEGER NOT NULL,
storykey INTEGER NOT NULL
);
CREATE UNIQUE INDEX story_dict_pkey ON story_dict(dictkey, storykey);
CREATE INDEX story_dict_tk_idx ON story_dict(storykey);

-- Query:
-- ======

SELECT DISTINCT ftd1.storykey
FROM story_dict ftd1, dict d1
WHERE d1.word = 'foo'
AND d1.key = ftd1.dictkey
AND EXISTS (
SELECT ft2.key
FROM story ft2, story_dict ftd2, dict d2
WHERE d2.word = 'bar'
AND d2.key = ftd2.dictkey
AND ftd2.storykey = ft2.key
AND ftd2.storykey = ftd1.storykey)
AND EXISTS (
SELECT ft3.key
FROM story ft3, story_dict ftd3, dict d3
WHERE d3.word = 'baz'
AND d3.key = ftd3.dictkey
AND ftd3.storykey = ft3.key
AND ftd3.storykey = ftd1.storykey)
ORDER BY ftd1.storykey
LIMIT 1000;

Plan with PGOPTIONS = "-fn":

Limit (cost=15409053054.71..15409053054.73 rows=1 width=12)
-> Unique (cost=15409053054.71..15409053054.73 rows=1 width=12)
-> Sort (cost=15409053054.71..15409053054.71 rows=9 width=12)
-> Nested Loop (cost=100000000.00..15409053054.57 rows=9 width=12)
-> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4)
-> Index Scan using story_dict_pkey on story_dict ftd1
(cost=0.00..15309052993.63 rows=4398 width=8)
SubPlan
-> Merge Join (cost=429157.62..435130.62 rows=1 width=16)
-> Sort (cost=13.11..13.11 rows=1 width=12)
-> Hash Join (cost=5.98..13.10 rows=1 width=12)
-> Index Scan using story_dict_tk_idx on story_dict ftd2
(cost=0.00..6.59 rows=106 width=8)
-> Hash (cost=5.98..5.98 rows=1 width=4)
-> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1
width=4)
-> Sort (cost=429144.50..429144.50 rows=2389196 width=4)
-> Seq Scan on story ft2 (cost=0.00..86701.96 rows=2389196 width=4)
-> Merge Join (cost=429157.62..435130.62 rows=1 width=16)
-> Sort (cost=13.11..13.11 rows=1 width=12)
-> Hash Join (cost=5.98..13.10 rows=1 width=12)
-> Index Scan using story_dict_tk_idx on story_dict ftd3
(cost=0.00..6.59 rows=106 width=8)
-> Hash (cost=5.98..5.98 rows=1 width=4)
-> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1
width=4)
-> Sort (cost=429144.50..429144.50 rows=2389196 width=4)
-> Seq Scan on story ft3 (cost=0.00..86701.96 rows=2389196 width=4)

Plan with PGOPTIONS = "-fh":

Limit (cost=635283.31..635283.33 rows=1 width=12)
-> Unique (cost=635283.31..635283.33 rows=1 width=12)
-> Sort (cost=635283.31..635283.31 rows=9 width=12)
-> Nested Loop (cost=0.00..635283.17 rows=9 width=12)
-> Index Scan using word_idx on dict d1 (cost=0.00..5.98 rows=1 width=4)
-> Index Scan using story_dict_pkey on story_dict ftd1
(cost=0.00..635222.22 rows=4398 width=8)
SubPlan
-> Nested Loop (cost=0.00..16.07 rows=1 width=16)
-> Nested Loop (cost=0.00..12.00 rows=1 width=12)
-> Index Scan using word_idx on dict d2 (cost=0.00..5.98 rows=1
width=4)
-> Index Scan using story_dict_pkey on story_dict ftd2
(cost=0.00..6.01 rows=1 width=8)
-> Index Scan using story_pkey on story ft2 (cost=0.00..4.06 rows=1
width=4)
-> Nested Loop (cost=0.00..16.07 rows=1 width=16)
-> Nested Loop (cost=0.00..12.00 rows=1 width=12)
-> Index Scan using word_idx on dict d3 (cost=0.00..5.98 rows=1
width=4)
-> Index Scan using story_dict_pkey on story_dict ftd3
(cost=0.00..6.01 rows=1 width=8)
-> Index Scan using story_pkey on story ft3 (cost=0.00..4.06 rows=1
width=4)

Responses

Re: Hash Join vs Nested Loops in 7.2.1 ... at 2002-04-09 05:07:26 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2002-04-09 05:07:26	Re: Hash Join vs Nested Loops in 7.2.1 ...
Previous Message	Tom Lane	2002-04-09 02:52:19	Re: now() AT TIME ZONE 'GMT';