From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Lars Haugseth <njus(at)larshaugseth(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Compound words giving undesirable results with tsearch2 |
Date: | 2006-05-30 14:50:10 |
Message-ID: | 447C5BA2.7080503@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> testdb=# select to_tsquery('default_norwegian', 'fritekst');
> to_tsquery
> ------------------------------
> 'fritekst' | 'fri' & 'tekst'
> (1 row)
>
> Now, this will indeed match those records, but it will also match any
> records containing both of the words 'fri' and 'tekst', without regard
> to whether they are next to each other or in completely different parts
> of the text being indexed. In many situations, this will lead to a lot
> of 'false' matches, seen from a user perspective.
It's a special feature (piece from mail from our norwegian customer)
<quotation>
Let us take the compound 'fotballbane'. (Soccer field)
Split : 'fotball' 'fot' 'ball' 'bane'
Example record : "Vedlikehold av baner for fotballklubber"
(Literal translation : "Maintenance of fields for soccer clubs")
The search for 'fotballbane' ('fotballbane' & 'fotball' & 'fot' &
'ball') will not match, even though the record is precisely about this
sort of thing. 'fotballbane' | ('fotball' & 'bane') | ('fot' & 'ball' &
'bane') will match.
</quotation>
So, all variants to split compound words are joined with OR, words in one
variant are joined with AND.
If thats isn't desirable you can forbid word split for ispell (just comment z
flag) or use for searching different configuration of tsearch.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | John DeSoi | 2006-05-30 15:19:42 | Lossy character conversion to Latin-1 |
Previous Message | Oleg Bartunov | 2006-05-30 14:11:08 | Re: Compound words giving undesirable results with tsearch2 |