Re: LC_COLLATE=es_MX in PgSQL 7.3.2

From: "Octavio Alvarez" <alvarezp(at)octavio(dot)ods(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: LC_COLLATE=es_MX in PgSQL 7.3.2
Date: 2003-06-12 23:10:31
Message-ID: 1702.63.84.67.3.1055459431.squirrel@doogie.ods.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane said:
>> I'm using PGSQL 7.3.2 under Redhat Linux 8.0. The database was
>> initialized
>> with --lc-collate=es_MX.
>
> How about --lc-ctype? I think that accent handling would be driven by
> LC_CTYPE not LC_COLLATE.

May be it's not the accents after all. I did the following tests without
accents.

Okay. Now, I tried several combinations, including --locale=es_MX and
--lc-collate=es_MX --lc-ctype=es_MX, and got the same result.

I would like to point out something: (still PG 7.3.2)

I tried the following with --locale=es_MX, with --locale=en_US, with
--locale=en_US.UTF-8.

alvarezp=# select * from t order by p asc, m asc;
p | m
-------+-------
octav | alvar
OCTAV | ALVAA
OCTAV | ALVAZ
octia | alvra
OCTIa | ALVAa
OCTIb | ALVZa
OCTIb | ALVZa
octic | alvra
OCTIc | ALVAa
octvi | alvra
OCTVI | ALVAa
OCTVI | ALVZa
(12 rows)

No accents here. I would have expected:
p | m
-------+-------
OCTAV | ALVAA
octav | alvar
OCTAV | ALVAZ
OCTIa | ALVAa
octia | alvra
OCTIb | ALVZa
OCTIb | ALVZa
OCTIc | ALVAa
octic | alvra
OCTVI | ALVAa
octvi | alvra
OCTVI | ALVZa
(12 rows)

--locale=C gives out
p | m
-------+-------
OCTAV | ALVAA
OCTAV | ALVAZ
OCTIa | ALVAa
OCTIb | ALVZa
OCTIb | ALVZa
OCTIc | ALVAa
OCTVI | ALVAa
OCTVI | ALVZa
octav | alvar
octia | alvra
octic | alvra
octvi | alvra
(12 rows)

which I thnk is correct for that locale. Well, whatever.

> In any case, this is not a Postgres bug unless
> you can show that other programs using the same LC_foo settings behave
> differently. We punt pretty much all locale-related processing to
> subroutines in libc.

How could I test that? I tried the following. Notice how the "octav"
values are correctly sorted, but I don't know if SORT is actually
separating the fields or understanding the whole line as 1 key.

[alvarezp(at)pgsql alvarezp]$ sort -t : < o
OCTAV:ALVAA
octav:alvar
OCTAV:ALVAZ
OCTIa:ALVAa
octia:alvra
OCTIb:ALVZa
OCTIb:ALVZa
OCTIc:ALVAa
octic:alvra
OCTVI:ALVAa
octvi:alvra
OCTVI:ALVZa

Whatever. Take a look at this one:

[alvarezp(at)pgsql alvarezp]$ sort -k 1,1 < o
octav alvar
OCTAV ALVAA
OCTAV ALVAZ
octia alvra
OCTIa ALVAa
OCTIb ALVZa
OCTIb ALVZa
octic alvra
OCTIc ALVAa
octvi alvra
OCTVI ALVAa
OCTVI ALVZa

I don't know if detection of which keys are equal (in this case
octav=OCTAV=OCTAV) should be made by PostgreSQL or libc. I also don't know
if I am wrong assuming octav=OCTAV. For alphabetic sorting, it should be
case insensitive.

Octavio.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jim C. Nasby 2003-06-12 23:14:08 Re: Generalizing max and min
Previous Message Dmitry Tkach 2003-06-12 23:10:22 More VACUUM output?