Re: UTF-8 collation on Windows?

From: Dev Kumkar <devdas(dot)kumkar(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: UTF-8 collation on Windows?
Date: 2014-02-20 12:18:42
Message-ID: CALSLE1PCRGCkwHdQ+6auR6CLENADW2Z9n+A7R=SRax_Wyq8JUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 20, 2014 at 4:34 PM, Daniel Verite <daniel(at)manitou-mail(dot)org>wrote:

> Despite windows-1252 being a monobyte encoding sharing most
> of LATIN1 codes and character set, it does not mean that
> English_United States.1252 is limited to this character set.
> You may use UTF-8 databases with that locale.
>
> Consider the 2nd paragraph of "Character Set Support"
> in the doc:
> http://www.postgresql.org/docs/current/static/multibyte.html
>
> "For C or POSIX locale, any character set is allowed, but for other
> locales there is only one character set that will work
> correctly. (On Windows, however, UTF-8 encoding can be used with
> any locale.)"
>
> This is a key difference with Unix when choosing a locale.
>
> As for getting the exact same sort order than Linux, it's not possible but
> that's not a Windows-vs-Unix issue. If you used FreeBSD or MacOS X, some
> en_US.UTF-8 collation rules would differ from Linux's libc too, resulting
> in
> a different sort order for certain strings.
>

There is no issue of using windows-1252 with utf8 database. The point of
discussion here is sorting order and windows code page for utf8?
The links http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspxwhich
I provided earlier has those code pages but creating database with
these code pages fail.

Well overall with the discussion so far and whatever search I could over
net/community it looks like there is no code page on windows corresponding
to what is utf8 of linux. If there is then please let me know?

Conclusion: I have basically decided to have the database encoding UTF8 on
both windows and linux. And then set the collation to 'C'.
At least my customers on linux and windows sees the same behavior when
sorting. Any gotchas here?

Regards...

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dev Kumkar 2014-02-20 12:29:14 Re: Timezone information
Previous Message Marti Raudsepp 2014-02-20 12:07:37 Re: automatically refresh all materialized views?