From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Tomi NA <hefest(at)gmail(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: full text search: the concept of a "word" |
Date: | 2006-04-20 23:55:58 |
Message-ID: | 44481F8E.1050800@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> My textfields are trigger-generated using information from a number of
> tables: these fields can be, say, a couple of thousand characters
> wide.
> Up to here, there's no problem.
> What I'd like to do is define - possibly using regexps - what
> constitutes a word. For instance, my word separator is a semicolon,
> not a space; a dash is not a separator, and neither are language
> specific characters (which might be interpreted that way by a language
> agnostic tool)...
> BTW, I use UTF-8 as my database encoding if it's of any importance.
I do not see a big problem: just write your own parser.
It's may be a problem with UTF-8: only CHS head tsearch2 supports UTF-8. But you
can find a patch on 8.1 at http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2006-04-21 00:00:59 | Re: GiST index slower than seqscan |
Previous Message | Tomi NA | 2006-04-20 22:49:31 | setting the environment locale - linux, windows |