From: | Denis Papathanasiou <denis(dot)papathanasiou(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Differences between to_tsquery() and plainto_tsquery() when it comes to applying an index? |
Date: | 2010-08-25 19:05:34 |
Message-ID: | 4C75697E.2030005@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> Not per se, but maybe looking at the actual outputs of the two function
> calls would be enlightening. I suspect you'll find that the search
> conditions you are getting are not equivalent.
Strictly speaking they're not, b/c the plainto_tsquery() is chaining
several tokens together.
However, at the heart of the question is this: if I define the index on
that column like this:
CREATE INDEX item_eng_searchable_text_idx ON item USING
gin(to_tsvector('english', searchable_text));
since, unlike the example in the
http://www.postgresql.org/docs/8.4/interactive/textsearch-tables.html
doc, I cannot define the searchable
text column in the table as being being of type ts_vector(), because the
text is not exclusively in English;
then, will the index be effective?
Based on some empirical experiments, it seems not.
If that's indeed that case, then how should I create the index?
> In that case you haven't understood how text search works at all.
> It simply doesn't do that. You could possibly follow up a text search
> for component words with a LIKE or similar comparison to verify that
> the matches actually contain the desired string.
I cannot use LIKE on each token of the phrase because the text in the
column is unordered, and I would have to do an ILIKE '%'+token+'%' on
each to be logically correct.
IIRC, wildcards of the form token+'%' can use an index efficiently, but
wildcards on both ends such as '%'+token+'%' do not.
I did think about splitting the phrase tokens and doing a tsquery() join
on each token, but it seemed that's why plainto_tsquery() was created,
unless I misunderstood the docs on that specific point.
From | Date | Subject | |
---|---|---|---|
Next Message | bricklen | 2010-08-25 19:20:02 | Re: unexpected message type 0x58 during COPY from stdin |
Previous Message | Tom Lane | 2010-08-25 19:01:46 | Re: unexpected message type 0x58 during COPY from stdin |