Re: Encoding/collation question

From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Encoding/collation question
Date: 2019-12-12 09:37:13
Message-ID: 20191212093713.GA3164@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Dec 12, 2019 at 05:03:59AM +0000, Andrew Gierth wrote:

> Rich> I doubt that my use will notice meaningful differences. Since
> Rich> there are only two or three databases in UTF8 and its collation
> Rich> perhaps I'll convert those to LATIN1 and C.
>
> Note that it's perfectly fine to use UTF8 encoding and C collation (this
> has the effect of sorting strings in Unicode codepoint order); this is
> as fast for comparisons as LATIN1/C is.
>
> For those cases where you need data to be sorted in a
> culturally-meaningful order rather than in codepoint order, you can set
> collations on specific columns or in individual queries.

Nice, thanks for pointing that out. One addition: while this
may seem like "the" magic bullet it should be noted that one
will need additional indexes for culturally-meaningful ORDER
BY sorts to be fast (while having a default non-C collation
one will get a by-default culturally-meaningful index for
that one non-C locale).

Question: is C collation expected to be future-proof /
rock-solid /stable -- like UTF8 for encoding choice -- or
could it end up like the SQL-ASCII encoding did: Yeah, we
support it, it's been in use a long time, it should work,
but, nah, one doesn't really want to choose it over UTF8 if
at all possible, or at least know *exactly* what one is doing
and BTW YMMV ?

Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-12-12 13:35:53 Re: Encoding/collation question
Previous Message Deepti Sharma S 2019-12-12 05:34:12 RE: PostgreSQL version compatibility with RHEL7.7