From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Queryplan within FTS/GIN index -search. |
Date: | 2009-10-31 06:20:48 |
Message-ID: | 4AEBD740.9060705@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Tom Lane wrote:
> But having said that, this particular test case is far from compelling.
> Any sane text search application is going to try to filter out
> common words as stopwords; it's only the failure to do that that's
> making this run slow.
Below is tests-runs not with a "commonterm" but and 80% term and a 60%
term.
There are two issues in this, one is the way PG "blows up" when
searching for a stop-word (and it even performs excellent when searching
for a term in the complete doc-base):
ftstest=# select id from ftstest where body_fts @@
to_tsquery('commonterm') limit 10;
id
----
1
2
3
4
5
6
7
8
9
10
(10 rows)
Time: 1.004 ms
ftstest=# select id from ftstest where body_fts @@ to_tsquery('the')
limit 10;
NOTICE: text-search query contains only stop words or doesn't contain
lexemes, ignored
NOTICE: text-search query contains only stop words or doesn't contain
lexemes, ignored
id
----
(0 rows)
Time: 0.587 ms
I can definetely effort the index-size for getting the first behavior to
my application. Stop words will first be really useful when searches for
them translates into full results not errors.
I also think you're trying to limit the scope of the problem more than
whats fair.
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm');
id
----
(0 rows)
Time: 28.230 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm');
id
----
(0 rows)
Time: 0.930 ms
(so explain analyze is not a fair measurement .. it seems to make the
problem way worse). This is "only" x28
Time: 22.432 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm80');
id
----
(0 rows)
Time: 0.992 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm80');
id
----
(0 rows)
Time: 22.393 ms
ftstest=#
And for a 80% term .. x23
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm') and body_fts @@ to_tsquery('commonterm60');
id
----
(0 rows)
Time: 0.954 ms
ftstest=# select id from ftstest where body_fts @@
to_tsquery('nonexistingterm & commonterm60');
id
----
(0 rows)
Time: 17.006 ms
and x17
Just trying to say that the body of the problem isn't a discussion about
stop-words.
That being said, if you coin the term "stopword" to mean "any term that
exists in all or close to all documents" then the way it behaves when
searching for only one of them is a situation that we'll hit all the
time. (when dealing with user typed input).
Jesper
--
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2009-10-31 08:55:34 | Re: Queryplan within FTS/GIN index -search. |
Previous Message | Tom Lane | 2009-10-31 03:11:32 | Re: Queryplan within FTS/GIN index -search. |