| From: | Bruce Momjian <bruce(at)momjian(dot)us> |
|---|---|
| To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
| Cc: | Aleksey Tsalolikhin <atsaloli(dot)tech(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org |
| Subject: | Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL) |
| Date: | 2012-08-29 20:15:56 |
| Message-ID: | 20120829201556.GB8748@momjian.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Wed, Aug 29, 2012 at 01:45:20PM -0500, Merlin Moncure wrote:
> On Wed, Aug 29, 2012 at 12:43 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Wed, Aug 29, 2012 at 10:31:21AM -0700, Aleksey Tsalolikhin wrote:
> >> On Wed, Aug 29, 2012 at 9:45 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> >> > citext unfortunately doesn't allow for index optimization of LIKE
> >> > queries, which IMNSHO defeats the whole purpose. to the best way
> >> > remains to use lower() ...
> >> > this will be index optimized and fast as long as you specified C
> >> > locale for your database.
> >>
> >> What is the difference between C and en_US.UTF8, please? We see that
> >> the same query (that invokes a sort) runs 15% faster under the C
> >> locale. The output between C and en_US.UTF8 is identical. We're
> >> considering moving our database from en_US.UTF8 to C, but we do deal
> >> with internationalized text.
> >
> > Well, C has reduced overhead for string comparisons, but obviously
> > doesn't work well for international characters. The single-byte
> > encodings have somewhat less overhead than UTF8. You can try using C
> > locales for databases that don't require non-ASCII characters.
>
> To add:
> The middle ground I usually choose is to have a database encoding of
> UTF8 but with the C (aka POSIX) locale. This gives you the ability to
> store any unicode but indexing operations will use the faster C string
> comparison operations for a significant performance boost --
> especially for partial string searches on an indexed column. This is
> an even more attractive option in 9.1 with the ability to specify
> specific collations at runtime.
Do you get proper sort ordering in this case, or only when you specific
the proper collation at runtime?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2012-08-29 20:17:09 | Re: C locale versus en_US.UTF8. (Was: String comparision in PostgreSQL) |
| Previous Message | Little, Douglas | 2012-08-29 20:00:29 | Re: psql & unix env variables |