Quick Links

Re: Concerning about Unicode-aware string handling

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Vincas Dargis <vindrg(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org, pierce(at)hogranch(dot)com
Subject:	Re: Concerning about Unicode-aware string handling
Date:	2012-05-21 14:04:15
Message-ID:	14801.1337609055@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Vincas Dargis <vindrg(at)gmail(dot)com> writes:
> Database created using:
> initdb -D ../data -E utf-8 -U postgres

That looks fairly dangerous, as it will absorb the database's locale
settings (particularly LC_CTYPE, which is what you care about for these
operations) from your shell environment. If the environment locale is
not for UTF8 encoding then it won't work at all. Best to specify a
--locale switch as well. See
http://www.postgresql.org/docs/9.1/static/charset.html

> But regexp_replace issue is still there. Regexp "\w" understands only
> as "ascii word character" ?

Locale-specific character classes in regexps are not terribly bright
about UTF8, because historically that code has not considered any
character codes above 255 :-(. So in UTF8 you only got correct behavior
for ASCII and LATIN1 characters. 9.2 will be better though not perfect:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=e00f68e49

regards, tom lane

In response to

Re: Concerning about Unicode-aware string handling at 2012-05-21 13:31:56 from Vincas Dargis

Browse pgsql-general by date

	From	Date	Subject
Next Message	Merlin Moncure	2012-05-21 14:08:27	Re: Global Named Prepared Statements
Previous Message	Vincas Dargis	2012-05-21 14:02:47	Re: Concerning about Unicode-aware string handling