From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Full text search - query plan? PG 8.4.1 |
Date: | 2009-10-23 20:32:53 |
Message-ID: | 4AE212F5.7080609@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Tom Lane wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> Tom Lane wrote:
>>> ... There's something strange about your tsvector index. Maybe
>>> it's really huge because the documents are huge?
>
>> huge is a relative term, but length(ts_vector(body)) is about 200 for
>> each document. Is that huge?
>
> It's bigger than the toy example I was trying, but not *that* much
> bigger. I think maybe your index is bloated. Try dropping and
> recreating it and see if the estimates change any.
I'm a bit reluctant to dropping it and re-creating it. It'll take a
couple of days to regenerate, so this should hopefully not be an common
situation for the system.
I have set the statistics target to 1000 for the tsvector, the
documentation didn't specify any heavy negative sides of doing that and
since that I haven't seen row estimates that are orders of magnitude off.
It is build from scratch using inserts all the way to around 10m now,
should that result in index-bloat? Can I inspect the size of bloat
without rebuilding (or similar locking operation)?
The query still has a "wrong" tipping point between the two query-plans:
ftstest=# explain analyze select body from ftstest where
ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..7357.77 rows=100 width=738) (actual
time=3978.974..8595.086 rows=100 loops=1)
-> Index Scan using ftstest_id_pri_idx on ftstest
(cost=0.00..1436458.05 rows=19523 width=738) (actual
time=3978.971..8594.932 rows=100 loops=1)
Filter: (ftstest_body_fts @@ to_tsquery('testterm'::text))
Total runtime: 8595.222 ms
(4 rows)
ftstest=# set enable_indexscan=off;
SET
ftstest=# explain analyze select body from ftstest where
ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=59959.61..59959.86 rows=100 width=738) (actual
time=338.832..339.055 rows=100 loops=1)
-> Sort (cost=59959.61..60008.41 rows=19523 width=738) (actual
time=338.828..338.908 rows=100 loops=1)
Sort Key: id
Sort Method: top-N heapsort Memory: 32kB
-> Bitmap Heap Scan on ftstest (cost=22891.18..59213.45
rows=19523 width=738) (actual time=5.097..316.780 rows=19444 loops=1)
Recheck Cond: (ftstest_body_fts @@
to_tsquery('testterm'::text))
-> Bitmap Index Scan on ftstest_tfs_idx
(cost=0.00..22886.30 rows=19523 width=0) (actual time=4.259..4.259
rows=20004 loops=1)
Index Cond: (ftstest_body_fts @@
to_tsquery('testterm'::text))
Total runtime: 339.201 ms
(9 rows)
So for getting 100 rows where the term exists in 19.444 of 10.000.000
documents it chooses the index-scan where it (given random distribution
of the documents) should scan: 100*(10000000/19444) = 51429 documents.
So it somehow believes that the cost for the bitmap index scan is higher
than it actually is or the cost for the index-scan is lower than it
actually is.
Is is possible to manually set the cost for the @@ operator? It seems
natural that matching up a ts_vector to a ts_query, which is a much
heavier operation than = and even is stored in EXTENDED storage should
be much higher than a integer in plain storage.
I tried to search docs for operator cost, but I only found the overall
ones in the configuration file that are base values.
Jesper
--
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-10-23 22:06:02 | Re: Full text search - query plan? PG 8.4.1 |
Previous Message | Jesper Krogh | 2009-10-23 19:29:45 | Re: Calculating selectivity for the query-planner on ts_vector colums. |