From: | somloieater(at)gmail(dot)com |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #7999: Regexp with utf8 |
Date: | 2013-03-27 10:32:57 |
Message-ID: | E1UKnf7-0005Sa-L4@wrigleys.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 7999
Logged by: david
Email address: somloieater(at)gmail(dot)com
PostgreSQL version: 9.1.8
Operating system: linux
Description:
\y and \Y do not behave correctly next to
multibyte utf-8 characters - they seem to invert their sensesː
Propper behaivour with ascii e
'es'~$$\y[eɛ]s$$ => t
Inverted behaviour with epsilon
'ɛs'~$$\y[eɛ]s$$ => f
'ɛs'~$$[eɛ]\ys$$ => t
'ɛs'~$$[eɛ]\Ys$$ => f
This seems to be a case of utf8 characters not being recognised as
word-forming:
'ɛ'~$$\w'$$ => f
I've checked with a few other characters which are >1byte in utf8. U+00F0
counds as \w, but nothing I've tried > FF matches. I wonder if it's
something to do with >256?
In case anyone else hits this bug, replacing \y with
(^|$|\s|[[:punct:]]) seems to work for me, although it's ugly.
From | Date | Subject | |
---|---|---|---|
Next Message | roberto.menoncin | 2013-03-27 13:07:48 | BUG #8000: ExclusiveLock on a simple SELECT ? |
Previous Message | John R Pierce | 2013-03-26 18:40:57 | Re: BUG #7998: Could not able to connect database |