IN subquery not using a hash

From: Paul Tillotson <spam1011(at)adelphia(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: IN subquery not using a hash
Date: 2005-07-20 23:33:19
Message-ID: 42DEDF3F.200@adelphia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

For the following query, postgres is running the IN subquery over and
over again (once for each row scanned in the parent table.)

I would have expected it to run the whole query once and create a hash
which would then be probed once for every row scanned in the parent
table. I assumed that it was not doing so because it thought that the
resulting hash table would exceed sort_mem, but setting sort_mem to half
a gigabyte did not make any difference. Is there some other reason that
the optimizer is not using a hash table?

563 pages * 8 KB per page * 296 tuples fetched / 52085 tuples in the
whole table = 25 KB. Shouldn't the optimizer think that the subquery
will only fetch 25 KB worth of rows?

(Later, I realized that the official name for "sort_mem" is now
work_mem. Now, does this mean that my set sort_mem = 500000 did not do
anything?)

Regards,
Paul Tillotson

omnis=> select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3
20040412 (Red Hat Linux 3.3.3-7)
(1 row)
omnis=> show sort_mem;
20000
omnis=> set sort_mem = 500000;
SET
omnis=> explain analyze select 1 from parents where lname = 'SMITH' OR
parentid IN (SELECT parentid FROM child where childlast = 'SMITH');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on parents (cost=357.78..226649.83 rows=59785 width=0) (actual
time=127.644..104568.639 rows=855 loops=1)
Filter: (((lname)::text = 'SMITH'::text) OR (subplan))
SubPlan
-> Materialize (cost=357.78..360.74 rows=296 width=4) (actual
time=0.001..0.257 rows=313 loops=117943)
-> Index Scan using child_childlast_index on child
(cost=0.00..357.48 rows=296 width=4) (actual time=0.073..1.325 rows=313
loops=1)
Index Cond: ((childlast)::text = 'SMITH'::text)
Total runtime: 104569.800 ms
(7 rows)

omnis=> select reltuples, relpages from pg_class where relname = 'child';
reltuples | relpages
-----------+----------
52085 | 563
(1 row)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-07-21 00:04:17 Re: IN subquery not using a hash
Previous Message Ed L. 2005-07-20 20:49:09 Re: help: production db stuck in startup mode