From: | Gábor Farkas <gabor(at)nekomancer(dot)net> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | postgresql uses Hash-join, i need Nested-loop |
Date: | 2009-08-24 06:54:46 |
Message-ID: | 2c173a350908232354r3f4f75f1p48e604f784b4ff93@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
hi,
i have a query that uses a Hash-Join, but it would be faster with Nested-Loop,
and i don't know how to persuade postgresql to do it.
details:
postgresql-8.2 + tsearch2
i have 2 tables, one for people, and one that does a many-to-many
link between people:
CREATE TABLE personlink (
id integer NOT NULL,
relid integer NOT NULL,
created timestamp with time zone DEFAULT now() NOT NULL,
changed timestamp with time zone,
editorid integer NOT NULL
);
btree indexes on "id" and "relid",
PRIMARY KEY btree index on (id,relid).
CREATE TABLE person (
id integer NOT NULL,
firstname character varying(255),
.
.
.
);
PRIMARY KEY btree index on "id".
gin index on "firstname" (for tsearch2)
(the "person" table contains more columns (around 30))
personlink contains 1.500.000 rows, person contains 900.000 rows.
i did a vacuum-with-analyze.
my query is:
SELECT personlink.id
FROM personlink
INNER JOIN person ON personlink.relid=person.id
WHERE to_tsquery('default','duck') @@ to_tsvector('default',person.firstname);
explain analyze says this:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=3108.62..35687.67 rows=1535 width=4) (actual
time=901.110..6113.683 rows=2 loops=1)
Hash Cond: (personlink.relid = person.id)
-> Seq Scan on personlink (cost=0.00..26805.14 rows=1535614
width=8) (actual time=0.029..3000.503 rows=1535614 loops=1)
-> Hash (cost=3097.80..3097.80 rows=866 width=4) (actual
time=0.185..0.185 rows=8 loops=1)
-> Bitmap Heap Scan on person (cost=23.09..3097.80 rows=866
width=4) (actual time=0.078..0.160 rows=8 loops=1)
Recheck Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
-> Bitmap Index Scan on person_firstname_exact
(cost=0.00..22.87 rows=866 width=0) (actual time=0.056..0.056 rows=8
loops=1)
Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
Total runtime: 6113.748 ms
(9 rows)
if i disable hash-joins with "SET enable_hashjoin =false;"
i get:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..45698.23 rows=1535 width=4) (actual
time=4.960..15.098 rows=2 loops=1)
-> Index Scan using person_firstname_exact on person
(cost=0.00..3463.53 rows=866 width=4) (actual time=0.117..0.234 rows=8
loops=1)
Index Cond: ('''duck'''::tsquery @@
to_tsvector('default'::text, (firstname)::text))
-> Index Scan using personlink_relid_idx on personlink
(cost=0.00..48.54 rows=18 width=8) (actual time=1.848..1.849 rows=0
loops=8)
Index Cond: (personlink.relid = person.id)
Total runtime: 15.253 ms
(6 rows)
what could i do to persuade postgresql to choose the faster Nested-Loop?
thanks,
gabor
From | Date | Subject | |
---|---|---|---|
Next Message | Fred Janon | 2009-08-24 09:24:59 | How to create a multi-column index with 2 dates using 'gist'? |
Previous Message | Jeff Janes | 2009-08-23 22:26:16 | Re: [PERFORMANCE] how to set wal_buffers |