Re: Per-column collation

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Per-column collation
Date: 2010-11-16 19:00:47
Message-ID: AANLkTimbcnWjUHKGGZZRgiptSLXNwfXT2MCsEFVMzUM6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

2010/11/16 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On mån, 2010-11-15 at 23:13 +0100, Pavel Stehule wrote:
>> a) default encoding for collate isn't same as default encoding of database
>>
>> it's minimally not friendly - mostly used encoding is UTF8, but in
>> most cases users should to write locale.utf8.
>
> I don't understand what you are trying to say.  Please provide more
> detail.

go down.

>
>> b) there is bug - default collate (database collate is ignored)
>>
>>
>> postgres=# show lc_collate;
>>  lc_collate
>> ────────────
>>  cs_CZ.UTF8
>> (1 row)
>>
>> Time: 0.518 ms
>> postgres=# select * from jmena order by v;
>>      v
>> ───────────
>>  Chromečka
>>  Crha
>>  Drobný
>>  Čečetka
>> (4 rows)
>>
>> postgres=# select * from jmena order by v collate "cs_CZ.utf8";
>>      v
>> ───────────
>>  Crha
>>  Čečetka
>>  Drobný
>>  Chromečka
>> (4 rows)
>>
>> both result should be same.
>
> I tried to reproduce this here but got the expected results.  Could you
> try to isolate a complete test script?
>

I can't to reproduce now too. On different system and comp. Maybe I
did some wrong. Sorry.

>> isn't there problem in case sensitive collate name? When I use a
>> lc_collate value, I got a error message
>>
>> postgres=# select * from jmena order by v collate "cs_CZ.UTF8";
>> ERROR:  collation "cs_CZ.UTF8" for current database encoding "UTF8"
>> does not exist
>> LINE 1: select * from jmena order by v collate "cs_CZ.UTF8";
>>
>> problem is when table is created without explicit collate.
>
> Well, I agree it's not totally nice, but we have to do something, and I
> think it's logical to use the locale names as collation names by
> default, and collation names are SQL identifiers.  Do you have any ideas
> for improving this?

yes - my first question is: Why we need to specify encoding, when only
one encoding is supported? I can't to use a cs_CZ.iso88592 when my db
use a UTF8 - btw there is wrong message:

yyy=# select * from jmena order by jmeno collate "cs_CZ.iso88592";
ERROR: collation "cs_CZ.iso88592" for current database encoding
"UTF8" does not exist
LINE 1: select * from jmena order by jmeno collate "cs_CZ.iso88592";
^

I don't know why, but preferred encoding for czech is iso88592 now -
but I can't to use it - so I can't to use a names "czech", "cs_CZ". I
always have to use a full name "cs_CZ.utf8". It's wrong. More - from
this moment, my application depends on firstly used encoding - I can't
to change encoding without refactoring of SQL statements - because
encoding is hardly there (in collation clause).

So I don't understand, why you fill a table pg_collation with thousand
collated that are not possible to use? If I use a utf8, then there
should be just utf8 based collates. And if you need to work with wide
collates, then I am for a preferring utf8 - minimally for central
europe region. if somebody would to use a collates here, then he will
use a combination cs, de, en - so it must to use a latin2 and latin1
or utf8. I think so encoding should not be a part of collation when it
is possible.

Regards

Pavel

>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-11-16 19:04:14 Re: autovacuum maintenance_work_mem
Previous Message Alvaro Herrera 2010-11-16 18:58:13 Re: unlogged tables