Re: Concerning about Unicode-aware string handling

From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Concerning about Unicode-aware string handling
Date: 2012-05-21 09:44:45
Message-ID: 4FBA0E8D.3090808@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/21/12 2:09 AM, Vincas Dargis wrote:
> We have problems (currently using 8.4, but also in latest 9.1.3) in
> our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
> Russian and of course potentially other languages.
>
> For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž.
>
> lower() and ~* comparison works only with locale that is set (no
> internationalization).
>
> Could we expect Unciode support in near future? Or should we do quick
> hacks by reimplementing regexp_replace(), lower(), upper() and other
> string SQL functions using, for example, Qt libraries..? Or maybe are
> there some kind simpler workarounds?

your database encoding is UTF8 ? the language or environment you're
using to generate those strings such as 'acząčž' is also UTF8 ?

postgresql supports UTF-8 unicode just fine. It does not directly
support the bastardized UTF-16 'unicode' implemented by Windows NT and
derivatives (2000, XP, 2003, Vista, 2008, 7), but on those platforms it
generally behaves fairly sanely as long as you realize UTF8 is its
native tongue.

of course, the database has to be created as a UTF8 database, its
possible to initialize the server cluster in "C"/"POSIX"/"SQLASCII"
which says bytes-are-bytes and encodings are unknown, or in various 8
bit encodings like LATIN-1.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2012-05-21 10:03:55 Re: odd intermittent query hanging issue
Previous Message Vincas Dargis 2012-05-21 09:09:42 Concerning about Unicode-aware string handling