Quick Links

ERROR: syntax error in tsquery - for high-unicode whitespace

From:	hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	ERROR: syntax error in tsquery - for high-unicode whitespace
Date:	2013-03-15 00:08:38
Message-ID:	20130315000838.GA12142@depesz.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

hi
it was tested on 9.1 and 9.3. Interestingly - it worked without error in
8.2.

$ select to_tsquery('english', E'a\xe2\x80\x86a');
ERROR: syntax error in tsquery: "a a"

the 3-byte utf8 character is SIX-PER-EM SPACE (based on info from
http://www.fileformat.info/info/unicode/char/2006/index.htm)

Not sure what should happen with it, but generally I thought that
whitespace characters will get ignored (treated as separators) when
building tsquery.

It seems to work that way when building tsvector though:

$ select to_tsvector('english', E'a\xe2\x80\x86a');
to_tsvector
-------------

(1 row)

and for larger example:

$ select to_tsvector('english', E'depesz\xe2\x80\x86whatever');
to_tsvector
-----------------------
'depesz':1 'whatev':2
(1 row)

$ select to_tsquery('english', E'depesz\xe2\x80\x86whatever');
ERROR: syntax error in tsquery: "depesz whatever"

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

Responses

Re: ERROR: syntax error in tsquery - for high-unicode whitespace at 2013-03-15 03:56:19 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2013-03-15 03:21:59	Re: BUG #7942: Timestamp "19991231 240000" should not be out of range
Previous Message	Daniel Farina	2013-03-14 23:41:24	Re: Questions about PostgreSQL Setup License