Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Aleksey Tsalolikhin <atsaloli(dot)tech(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)
Date: 2012-08-29 20:32:35
Message-ID: CAOR=d=1x3L9+R31cqONkw767uDCRa7HExU7V2GFC8_RC=_YcaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 29, 2012 at 2:17 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Wed, Aug 29, 2012 at 12:52:50PM -0600, Scott Marlowe wrote:
>> On Wed, Aug 29, 2012 at 11:43 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > On Wed, Aug 29, 2012 at 10:31:21AM -0700, Aleksey Tsalolikhin wrote:
>> >> On Wed, Aug 29, 2012 at 9:45 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> >> > citext unfortunately doesn't allow for index optimization of LIKE
>> >> > queries, which IMNSHO defeats the whole purpose. to the best way
>> >> > remains to use lower() ...
>> >> > this will be index optimized and fast as long as you specified C
>> >> > locale for your database.
>> >>
>> >> What is the difference between C and en_US.UTF8, please? We see that
>> >> the same query (that invokes a sort) runs 15% faster under the C
>> >> locale. The output between C and en_US.UTF8 is identical. We're
>> >> considering moving our database from en_US.UTF8 to C, but we do deal
>> >> with internationalized text.
>> >
>> > Well, C has reduced overhead for string comparisons, but obviously
>> > doesn't work well for international characters. The single-byte
>> > encodings have somewhat less overhead than UTF8. You can try using C
>> > locales for databases that don't require non-ASCII characters.
>>
>> I think you're confusing encodings with locales. C is a locale. You
>
> I think technically C is a non-locale.

True. But it's NOT an encoding.

>> can have a database with a locale of C and UTF-8 encoding.
>>
>> create database clocale_utf8 encoding='UTF8' LC_COLLATE= 'C' template=template0;
>>
>> \l
>> Name | Owner | Encoding | Collate | Ctype |
>> Access privileges
>> --------------+----------+-----------+-------------+-------------+-----------------------
>> clocale_utf8 | smarlowe | UTF8 | C | en_US.UTF-8 |
>>
>>
>> SQL_ASCII is the encoding equivalent of C locale, but it also allows
>> multi-byte characters.
>
> Yes, but what sort ordering do you get in that case?

Byte ordering.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Grzegorz Tańczyk 2012-08-29 20:50:17 Refreshing functional index
Previous Message Bruce Momjian 2012-08-29 20:17:09 Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)