Re: to_tsvector in 8.2.3

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: richardcraig <richard(at)v3fm(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: to_tsvector in 8.2.3
Date: 2007-03-21 18:13:55
Message-ID: 460175E3.40601@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> postgres=# select to_tsvector('test text');
> to_tsvector
> ---------------
> 'test text':1
> (1 row)
Ok. that's related to
http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/tsearch2/wordparser/parser.c.diff?r1=1.11;r2=1.12;f=h
commit. Thomas pointed that it can be non-breakable space (0xa0) and that commit
assumes any character with C locale and multibyte encoding and > 0x7f is alpha.
To check theory, pls, apply attached patch.

If so, I'm confused, we can not assume that 0xa0 is a space symbol in any
multibyte encoding, even in Windows.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
nonbreak.patch text/plain 609 bytes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Travis 2007-03-21 18:31:21 Re: best way to kill long running query?
Previous Message Tom Lane 2007-03-21 17:51:22 Re: best way to kill long running query?