From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Queryplan within FTS/GIN index -search. |
Date: | 2009-10-22 16:28:13 |
Message-ID: | 4AE0881D.9070201@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi
My indexing base is now up to 7.5m documents, I have raise statistics
target to 1000 for the tsvector column in order to make the
query-planner choose more correctly. That works excellent.
Table structure is still:
ftstest=# \d ftsbody
Table "public.ftsbody"
Column | Type | Modifiers
------------------+----------+------------------------------------------------------
id | integer | not null default
nextval('ftsbody_id_seq'::regclass)
body | text | not null default ''::text
ftsbody_body_fts | tsvector |
Indexes:
"ftsbody_body_md5" UNIQUE, btree (md5(body))
"ftsbody_id_pri_idx" UNIQUE, btree (id)
"ftsbody_tfs_idx" gin (ftsbody_body_fts)
Triggers:
tsvectorupdate BEFORE INSERT OR UPDATE ON uniprot FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger('ftsbody_body_fts',
'pg_catalog.english', 'body')
I'm searching the gin-index for 1-5 terms, where all of them matches the
same document. TERM1 is unique by itself, TERM2 is a bit more common (52
rows), TERM3 more common, TERM4 close to all and TERM5 all records.
Just quering for a unique value and add in several values that match
everything makes the run-time go significantly up.
I somehow would expect the index-search to take advantage of the MCV's
informations in the statistics that sort of translate it into a search
and post-filtering (as PG's queryplanner usually does at the SQL-level).
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual time=3.726..3.729
rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=3.722..3.723 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=3.700..3.702 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=3.683..3.683 rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2'::text))
Total runtime: 3.790 ms
(9 rows)
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=850.017..850.020 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=850.013..850.015 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=849.991..849.993 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=849.970..849.970 rows=1
loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3'::text))
Total runtime: 850.084 ms
(9 rows)
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.065..1152.068 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=1152.061..1152.062 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=1152.039..1152.041 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1152.020..1152.020
rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4'::text))
Total runtime: 1152.129 ms
(9 rows)
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.043..1509.046 rows=1 loops=1)
-> Sort (cost=102.45..102.45 rows=1 width=751) (actual
time=1509.040..1509.040 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 27kB
-> Bitmap Heap Scan on ftsbody (cost=100.42..102.44 rows=1
width=751) (actual time=1509.018..1509.020 rows=1 loops=1)
Recheck Cond: (ftsbody_body_fts @@ to_tsquery('TERM1 &
TERM2 & TERM3 & TERM4 & TERM5'::text))
-> Bitmap Index Scan on ftsbody_tfs_idx
(cost=0.00..100.42 rows=1 width=0) (actual time=1508.998..1508.998
rows=1 loops=1)
Index Cond: (ftsbody_body_fts @@ to_tsquery('TERM1
& TERM2 & TERM3 & TERM4 & TERM5'::text))
Total runtime: 1509.109 ms
(9 rows)
Can (perhaps more readable) be found at http://krogh.cc/~jesper/test.out
Can this be optimized? (I cannot really prevent users from typing stuff
in that are common).
--
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Buckham | 2009-10-22 18:50:44 | Table Clustering & Time Range Queries |
Previous Message | Robert Haas | 2009-10-22 13:26:56 | Re: optimizing query with multiple aggregates |