From: | "Octavio Alvarez" <alvarezp(at)octavio(dot)ods(dot)org> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: LC_COLLATE=es_MX in PgSQL 7.3.2 |
Date: | 2003-06-12 23:10:31 |
Message-ID: | 1702.63.84.67.3.1055459431.squirrel@doogie.ods.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane said:
>> I'm using PGSQL 7.3.2 under Redhat Linux 8.0. The database was
>> initialized
>> with --lc-collate=es_MX.
>
> How about --lc-ctype? I think that accent handling would be driven by
> LC_CTYPE not LC_COLLATE.
May be it's not the accents after all. I did the following tests without
accents.
Okay. Now, I tried several combinations, including --locale=es_MX and
--lc-collate=es_MX --lc-ctype=es_MX, and got the same result.
I would like to point out something: (still PG 7.3.2)
I tried the following with --locale=es_MX, with --locale=en_US, with
--locale=en_US.UTF-8.
alvarezp=# select * from t order by p asc, m asc;
p | m
-------+-------
octav | alvar
OCTAV | ALVAA
OCTAV | ALVAZ
octia | alvra
OCTIa | ALVAa
OCTIb | ALVZa
OCTIb | ALVZa
octic | alvra
OCTIc | ALVAa
octvi | alvra
OCTVI | ALVAa
OCTVI | ALVZa
(12 rows)
No accents here. I would have expected:
p | m
-------+-------
OCTAV | ALVAA
octav | alvar
OCTAV | ALVAZ
OCTIa | ALVAa
octia | alvra
OCTIb | ALVZa
OCTIb | ALVZa
OCTIc | ALVAa
octic | alvra
OCTVI | ALVAa
octvi | alvra
OCTVI | ALVZa
(12 rows)
--locale=C gives out
p | m
-------+-------
OCTAV | ALVAA
OCTAV | ALVAZ
OCTIa | ALVAa
OCTIb | ALVZa
OCTIb | ALVZa
OCTIc | ALVAa
OCTVI | ALVAa
OCTVI | ALVZa
octav | alvar
octia | alvra
octic | alvra
octvi | alvra
(12 rows)
which I thnk is correct for that locale. Well, whatever.
> In any case, this is not a Postgres bug unless
> you can show that other programs using the same LC_foo settings behave
> differently. We punt pretty much all locale-related processing to
> subroutines in libc.
How could I test that? I tried the following. Notice how the "octav"
values are correctly sorted, but I don't know if SORT is actually
separating the fields or understanding the whole line as 1 key.
[alvarezp(at)pgsql alvarezp]$ sort -t : < o
OCTAV:ALVAA
octav:alvar
OCTAV:ALVAZ
OCTIa:ALVAa
octia:alvra
OCTIb:ALVZa
OCTIb:ALVZa
OCTIc:ALVAa
octic:alvra
OCTVI:ALVAa
octvi:alvra
OCTVI:ALVZa
Whatever. Take a look at this one:
[alvarezp(at)pgsql alvarezp]$ sort -k 1,1 < o
octav alvar
OCTAV ALVAA
OCTAV ALVAZ
octia alvra
OCTIa ALVAa
OCTIb ALVZa
OCTIb ALVZa
octic alvra
OCTIc ALVAa
octvi alvra
OCTVI ALVAa
OCTVI ALVZa
I don't know if detection of which keys are equal (in this case
octav=OCTAV=OCTAV) should be made by PostgreSQL or libc. I also don't know
if I am wrong assuming octav=OCTAV. For alphabetic sorting, it should be
case insensitive.
Octavio.
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2003-06-12 23:14:08 | Re: Generalizing max and min |
Previous Message | Dmitry Tkach | 2003-06-12 23:10:22 | More VACUUM output? |