From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Thomas Pundt <thomas(dot)pundt(at)rp-online(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: to_tsvector in 8.2.3 |
Date: | 2007-03-21 15:26:19 |
Message-ID: | 46014E9B.1080301@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
8.2 has fully rewritten text parser based on POSIX is* functions.
Thomas Pundt wrote:
> On Wednesday 21 March 2007 14:25, Teodor Sigaev wrote:
> | I can't reproduce your problem, but I have not Windows box, can anybody
> | reproduce that?
>
> just a guess in the wild; I once had a similar phenomen and tracked it down
> to a "non breaking space character" (0xA0). Since then I'm patching the
> tsearch2 lexer:
>
> --- postgresql-8.1.5/contrib/tsearch2/wordparser/parser.l
> +++ postgresql-8.1.4/contrib/tsearch2/wordparser/parser.l
> @@ -78,8 +78,8 @@
> /* cyrillic koi8 char */
> CYRALNUM [0-9\200-\377]
> CYRALPHA [\200-\377]
> -ALPHA [a-zA-Z\200-\377]
> -ALNUM [0-9a-zA-Z\200-\377]
> +ALPHA [a-zA-Z\200-\237\241-\377]
> +ALNUM [0-9a-zA-Z\200-\237\241-\377]
>
>
> HOSTNAME ([-_[:alnum:]]+\.)+[[:alpha:]]+
> @@ -307,7 +307,7 @@
> return UWORD;
> }
>
> -[ \r\n\t]+ {
> +[ \240\r\n\t]+ {
> token = tsearch2_yytext;
> tokenlen = tsearch2_yyleng;
> return SPACE;
>
>
> Ciao,
> Thomas
>
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-03-21 15:26:21 | Re: Remove add_missing_from_clause? |
Previous Message | Benjamin Arai | 2007-03-21 15:26:15 | multi terabyte fulltext searching |