From: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | why hash on the primary key? |
Date: | 2008-11-28 19:55:22 |
Message-ID: | 603c8f070811281155u50b1edb9i85e4045677f729e3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'm seeing a lot of plans in my database that look like this:
portal=# explain select * from foo i, foo j where i.id = j.id;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=769.87..2159.36 rows=13283 width=264)
Hash Cond: (i.id = j.id)
-> Seq Scan on foo i (cost=0.00..343.83 rows=13283 width=132)
-> Hash (cost=343.83..343.83 rows=13283 width=132)
-> Seq Scan on foo j (cost=0.00..343.83 rows=13283 width=132)
It seems very strange for the planner to decide to build an in-memory
hash table on a column that is already indexed (the primary key, no
less!). But this is happening A LOT - I often see plans where a
majority of the joins are executed this way (and they're not all
self-joins either...). It seems like the planner is concluding that
it's going to need most or all of the pages in the table anyway, and
that building a hash table as it goes is quicker than reading the
index pages in from disk. On a simple query like the above, setting
enable_seqscan to off or random_page_cost to 1 generates the expected
plan:
QUERY PLAN
-------------------------------------------------------------------------------------
Merge Join (cost=0.00..2534.24 rows=13283 width=264)
Merge Cond: (i.id = j.id)
-> Index Scan using foo_pkey on foo i (cost=0.00..1167.50
rows=13283 width=132)
-> Index Scan using foo_pkey on foo j (cost=0.00..1167.50
rows=13283 width=132)
(4 rows)
Experimentation shows this is actually about 25% faster. But, this is
kind of a blunt instrument, and my attempts to fiddle with various
parameters have not been real succesful in generating better plans for
more complicated examples.
Any suggestions/explanations?
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Adam Rich | 2008-11-28 20:04:58 | Re: why hash on the primary key? |
Previous Message | Ron Mayer | 2008-11-28 19:43:31 | Re: Very large tables |