Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Aleksey Tsalolikhin <atsaloli(dot)tech(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)
Date: 2012-08-29 18:45:20
Message-ID: CAHyXU0wTuZMoNfjXz_QrVHQtsn0Pr2d2G9AaWpd09kpugTZJCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 29, 2012 at 12:43 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Wed, Aug 29, 2012 at 10:31:21AM -0700, Aleksey Tsalolikhin wrote:
>> On Wed, Aug 29, 2012 at 9:45 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> > citext unfortunately doesn't allow for index optimization of LIKE
>> > queries, which IMNSHO defeats the whole purpose. to the best way
>> > remains to use lower() ...
>> > this will be index optimized and fast as long as you specified C
>> > locale for your database.
>>
>> What is the difference between C and en_US.UTF8, please? We see that
>> the same query (that invokes a sort) runs 15% faster under the C
>> locale. The output between C and en_US.UTF8 is identical. We're
>> considering moving our database from en_US.UTF8 to C, but we do deal
>> with internationalized text.
>
> Well, C has reduced overhead for string comparisons, but obviously
> doesn't work well for international characters. The single-byte
> encodings have somewhat less overhead than UTF8. You can try using C
> locales for databases that don't require non-ASCII characters.

To add:
The middle ground I usually choose is to have a database encoding of
UTF8 but with the C (aka POSIX) locale. This gives you the ability to
store any unicode but indexing operations will use the faster C string
comparison operations for a significant performance boost --
especially for partial string searches on an indexed column. This is
an even more attractive option in 9.1 with the ability to specify
specific collations at runtime.

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2012-08-29 18:52:50 Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)
Previous Message Bruce Momjian 2012-08-29 17:43:34 Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL)