Quick Links

small table left outer join big table

From:	Jie Li <jay23jack(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	small table left outer join big table
Date:	2010-12-28 10:13:40
Message-ID:	AANLkTikNKpOfM=OPzfMgi9_q2tyUVSSPA0vBYaJ2_mk4@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using
(id);
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
Hash Cond: (small_table.id = big_table.id)
-> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
-> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
-> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.

Thanks,
Li Jie

Responses

Re: small table left outer join big table at 2010-12-28 15:09:51 from Gurjeet Singh
Re: small table left outer join big table at 2010-12-29 12:17:17 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Itagaki Takahiro	2010-12-28 10:23:43	Re: SQL/MED - core functionality
Previous Message	Heikki Linnakangas	2010-12-28 09:59:09	Re: SQL/MED - core functionality