From: | tgl(at)postgresql(dot)org (Tom Lane) |
---|---|
To: | pgsql-committers(at)postgresql(dot)org |
Subject: | pgsql: Teach the regular expression functions to do case-insensitive |
Date: | 2009-12-01 21:00:24 |
Message-ID: | 20091201210024.B1393753FB7@cvs.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers |
Log Message:
-----------
Teach the regular expression functions to do case-insensitive matching and
locale-dependent character classification properly when the database encoding
is UTF8.
The previous coding worked okay in single-byte encodings, or in any case for
ASCII characters, but failed entirely on multibyte characters. The fix
assumes that the <wctype.h> functions use Unicode code points as the wchar
representation for Unicode, ie, wchar matches pg_wchar.
This is only a partial solution, since we're still stupid about non-ASCII
characters in multibyte encodings other than UTF8. The practical effect
of that is limited, however, since those cases are generally Far Eastern
glyphs for which concepts like case-folding don't apply anyway. Certainly
all or nearly all of the field reports of problems have been about UTF8.
A more general solution would require switching to the platform's wchar
representation for all regex operations; which is possible but would have
substantial disadvantages. Let's try this and see if it's sufficient in
practice.
Modified Files:
--------------
pgsql/src/backend/regex:
regc_locale.c (r1.9 -> r1.10)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/regex/regc_locale.c?r1=1.9&r2=1.10)
pgsql/src/include/regex:
regcustom.h (r1.7 -> r1.8)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/regex/regcustom.h?r1=1.7&r2=1.8)
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2009-12-01 22:34:33 | pgsql: psql -f - Adjust psql -f - to behave like a normal file and |
Previous Message | Tom Lane | 2009-12-01 19:07:22 | Re: Re: [COMMITTERS] pgsql: Rewrite GEQO`s gimme_tree function so that it always finds a |