| From: | hubert depesz lubaczewski <depesz(at)depesz(dot)com> | 
|---|---|
| To: | pgsql-bugs(at)postgresql(dot)org | 
| Subject: | ERROR: syntax error in tsquery - for high-unicode whitespace | 
| Date: | 2013-03-15 00:08:38 | 
| Message-ID: | 20130315000838.GA12142@depesz.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
hi
it was tested on 9.1 and 9.3. Interestingly - it worked without error in
8.2.
$ select to_tsquery('english', E'a\xe2\x80\x86a');
ERROR:  syntax error in tsquery: "a a"
the 3-byte utf8 character is SIX-PER-EM SPACE (based on info from
http://www.fileformat.info/info/unicode/char/2006/index.htm)
Not sure what should happen with it, but generally I thought that
whitespace characters will get ignored (treated as separators) when
building tsquery.
It seems to work that way when building tsvector though:
$ select to_tsvector('english', E'a\xe2\x80\x86a');
 to_tsvector 
-------------
  
(1 row)
and for larger example:
$ select to_tsvector('english', E'depesz\xe2\x80\x86whatever');
      to_tsvector      
-----------------------
 'depesz':1 'whatev':2
(1 row)
$ select to_tsquery('english', E'depesz\xe2\x80\x86whatever');
ERROR:  syntax error in tsquery: "depesz whatever"
Best regards,
depesz
-- 
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2013-03-15 03:21:59 | Re: BUG #7942: Timestamp "19991231 240000" should not be out of range | 
| Previous Message | Daniel Farina | 2013-03-14 23:41:24 | Re: Questions about PostgreSQL Setup License |