Re: UTF-8 collation on Windows?

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Dev Kumkar" <devdas(dot)kumkar(at)gmail(dot)com>
Cc: "Adrian Klaver" <adrian(dot)klaver(at)aklaver(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: UTF-8 collation on Windows?
Date: 2014-02-20 11:04:54
Message-ID: c1c35ef6-6d75-4124-836d-5e308202dff5@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dev Kumkar wrote:

> Succeeds but as replied earlier it creates database with LC_COLLATE =
> 'English_United States.1252' which corresponds to Latin1.

Despite windows-1252 being a monobyte encoding sharing most
of LATIN1 codes and character set, it does not mean that
English_United States.1252 is limited to this character set.
You may use UTF-8 databases with that locale.

Consider the 2nd paragraph of "Character Set Support"
in the doc:
http://www.postgresql.org/docs/current/static/multibyte.html

"For C or POSIX locale, any character set is allowed, but for other
locales there is only one character set that will work
correctly. (On Windows, however, UTF-8 encoding can be used with
any locale.)"

This is a key difference with Unix when choosing a locale.

As for getting the exact same sort order than Linux, it's not possible but
that's not a Windows-vs-Unix issue. If you used FreeBSD or MacOS X, some
en_US.UTF-8 collation rules would differ from Linux's libc too, resulting in
a different sort order for certain strings.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Kane 2014-02-20 12:03:27 Re: Possible to improve optimisation / index usage based on domain properties of a function
Previous Message Vik Fearing 2014-02-20 10:31:24 Re: Query