From: | "Jean-Baptiste Quenot" <jbq(at)caraldi(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #4200: Regexp character classes not UTF8-compliant |
Date: | 2008-05-26 19:13:05 |
Message-ID: | 200805261913.m4QJD5gh048059@wwwmaster.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged online:
Bug reference: 4200
Logged by: Jean-Baptiste Quenot
Email address: jbq(at)caraldi(dot)com
PostgreSQL version: 8.3.1
Operating system: Linux Ubuntu Hardy
Description: Regexp character classes not UTF8-compliant
Details:
PostgreSQL documentation at
http://www.postgresql.org/docs/8.3/static/functions-matching.html describes
the various character classes, and they can be used to match or replace
strings with regexp support. However, the [:alnum:] and [:alpha:] character
classes are not UTF8-compliant, like shown in the examples below:
dockee=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
dockee=# show lc_ctype;
lc_ctype
-------------
en_US.UTF-8
(1 row)
dockee=# select regexp_replace('bbu', '[[:alnum:]]', '', 'g');
regexp_replace
----------------
(1 row)
ovhdev=# select regexp_replace('bbu', '[[:alpha:]]', '', 'g');
regexp_replace
----------------
(1 row)
dockee=# select regexp_replace('bbu', $$\w$$, '', 'g');
regexp_replace
----------------
(1 row)
Only characters in the ASCII range were correctly detected to belong to the
[:alnum:] character class, whereas other characters are valid too.
From | Date | Subject | |
---|---|---|---|
Next Message | Nahum Castro | 2008-05-27 00:44:47 | BUG #4201: Instalation fails |
Previous Message | Tom Lane | 2008-05-26 17:27:02 | Re: BUG #4177: Dump and restore from Slonified 8.1.11 causes a segfault |