From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Per-column collation, work in progress |
Date: | 2010-09-23 09:55:21 |
Message-ID: | AANLkTinFt2U0NibM0UX=Pw-bSTTCpUp2o9pH9NdPj_+m@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2010/9/23 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On tor, 2010-09-23 at 10:12 +0200, Pavel Stehule wrote:
>> 1. It's doesn't work with SQL 92 rules for sortby list. I can
>> understand so explicit COLLATE using doesn't work, but the implicit
>> using doesn't work too:
>>
>> CREATE TABLE foo(a text, b text COLLATE "cs_CZ.UTF8")
>>
>> SELECT * FROM foo ORDER BY 1 -- produce wrong order
>
> I can't reproduce that. Please provide more details.
sorry, it is ok - I was confused
>
>> 2. Why default encoding for collate is static? There are latin2 for
>> czech, cs_CZ and cs_CZ.iso88592. So any user with UTF8 has to write
>> encoding explicitly. But the more used and preferred encoding is UTF8
>> now. I am thinking so cs_CZ on utf8 database should mean cs_CS.UTF8.
>
> That's tweakable. One idea I had is to strip the ".utf8" suffix from
> locale names when populating the pg_collation catalog, or create both
> versions. I agree that the current way is a bit cumbersome.
>
yes. now almost all databases are in utf8
>> 3. postgres=# select to_char(current_date,'tmday') collate "cs_CZ.utf8";
>> to_char
>> ──────────
>> thursday -- bad result
>> (1 row)
>
> As was already pointed out, collation only covers lc_collate and
> lc_ctype. (It could cover other things, for example an application to
> the money type was briefly discussed, but that's outside the current
> mandate.)
>
ook
> As a point of order, what you wrote above attaches a collation to the
> result of the function call. To get the collation to apply to the
> function call itself, you have to put the collate clause on one of the
> arguments, e.g.,
>
> select to_char(current_date,'tmday' collate "cs_CZ.utf8");
I am thinking, collates can be used for this purpose too. I see some
impacts - this syntax changes a stable function to immutable and it
cannot be simple to solve.
>
>> 4. is somewhere ToDo for collation implementation?
>
> At the moment it's mostly in the source code. I have a list of notes
> locally that I can clean up and put in the wiki once we agree on the
> general direction.
>
>> 5.
>>
>> postgres=# create table xy(a text, b text collate "cs_CZ");
>> ERROR: collation "cs_CZ" for current database encoding "UTF8" does not exist
>>
>> can be there some more friendly message or hint ? like "you cannot to
>> use a different encoding". This collate is in pg_collates table.
>
> That can surely be polished.
>
>
Regards
Pavel Stehule
From | Date | Subject | |
---|---|---|---|
Next Message | Boxuan Zhai | 2010-09-23 10:31:22 | ask for review of MERGE |
Previous Message | Magnus Hagander | 2010-09-23 09:54:08 | Re: Git cvsserver serious issue |