From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Per-column collation |
Date: | 2010-11-16 19:59:50 |
Message-ID: | AANLkTimKxyd5aNfo9OP6WDEaCE7HSFP9LBzhMCjQyL5F@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2010/11/16 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On tis, 2010-11-16 at 20:00 +0100, Pavel Stehule wrote:
>> yes - my first question is: Why we need to specify encoding, when only
>> one encoding is supported? I can't to use a cs_CZ.iso88592 when my db
>> use a UTF8 - btw there is wrong message:
>>
>> yyy=# select * from jmena order by jmeno collate "cs_CZ.iso88592";
>> ERROR: collation "cs_CZ.iso88592" for current database encoding
>> "UTF8" does not exist
>> LINE 1: select * from jmena order by jmeno collate "cs_CZ.iso88592";
>> ^
>
> Sorry, is there some mistake in that message?
>
it is unclean - I expect some like "cannot to use collation
cs_CZ.iso88502, because your database use a utf8 encoding".
>> I don't know why, but preferred encoding for czech is iso88592 now -
>> but I can't to use it - so I can't to use a names "czech", "cs_CZ". I
>> always have to use a full name "cs_CZ.utf8". It's wrong. More - from
>> this moment, my application depends on firstly used encoding - I can't
>> to change encoding without refactoring of SQL statements - because
>> encoding is hardly there (in collation clause).
>
> I can only look at the locales that the operating system provides. We
> could conceivably make some simplifications like stripping off the
> ".utf8", but then how far do we go and where do we stop? Locale names
> on Windows look different too. But in general, how do you suppose we
> should map an operating system locale name to an "acceptable" SQL
> identifier? You might hope, for example, that we could look through the
> list of operating system locale names and map, say,
>
> cs_CZ -> "czech"
> cs_CZ.iso88592 -> "czech"
> cs_CZ.utf8 -> "czech"
> czech -> "czech"
>
> but we have no way to actually know that these are semantically similar,
> so this illustrated mapping is AI complete. We need to take the locale
> names as is, and that may or may not carry encoding information.
>
>> So I don't understand, why you fill a table pg_collation with thousand
>> collated that are not possible to use? If I use a utf8, then there
>> should be just utf8 based collates. And if you need to work with wide
>> collates, then I am for a preferring utf8 - minimally for central
>> europe region. if somebody would to use a collates here, then he will
>> use a combination cs, de, en - so it must to use a latin2 and latin1
>> or utf8. I think so encoding should not be a part of collation when it
>> is possible.
>
> Different databases can have different encodings, but the pg_collation
> catalog is copied from the template database in any case. We can't do
> any changes in system catalogs as we create new databases, so the
> "useless" collations have to be there. There are only a few hundred,
> actually, so it's not really a lot of wasted space.
>
I have not a problem with size. Just I think, current behave isn't
practical. When database encoding is utf8, then I except, so "cs_CZ"
or "czech" will be for utf8. I understand, so template0 must have a
all locales, and I understand why current behave is, but it is very
user unfriendly. Actually, only old application in CR uses latin2,
almost all uses a utf, but now latin2 is preferred. This is bad and
should be solved.
Regards
Pavel
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | marcin mank | 2010-11-16 20:05:16 | Re: Per-column collation |
Previous Message | Greg Stark | 2010-11-16 19:53:53 | Re: Explain analyze getrusage tracking |