Quick Links

Re: WIP patch: Collation support

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch: Collation support
Date:	2008-09-10 09:51:02
Message-ID:	48C79886.9030504@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Martijn van Oosterhout wrote:
> On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:
>> Radek Strnad wrote:
>>> - because of pg_collation and pg_charset are catalogs individual for each
>>> database, if you want to create a database with collation other than
>>> specified, create it in template1 and then create database
>> I have to wonder, is all that really necessary? The feature you're
>> trying to implement is to support database-level collation at first, and
>> perhaps column-level collation later. We don't need support for
>> user-defined collations and charsets for that.
>
> Since the set of collations isn't exactly denumerable, we need some way
> to allow the user to specify the collation they want. The only
> collation PostgreSQL knows about is the C collation. Anything else is
> user-defined.

Let's just use the name of the OS locale, like we do now. Having a
pg_collation catalog just moves the problem elsewhere: we'd still need
something in pg_collation to tie the collation to the OS locale.

>>> Design & functionality changes left:
>>> - move retrieveing collation from pg_database to pg_type
>> I don't understand this item. What will you move?
>
> Long term, the collation is a property of the type, ...

You might want to provide a default collation for a type as well, but
the very finest grade is that you can specify collation for every (text)
comparison operator in your query. Of course you don't want to do that
for every query, which is why we should provide defaults at different
levels: columns, tables, database. And perhaps types as well, but that's
not the most interesting case.

I'm not sure what the SQL spec says about that, but I believe it
provides syntax and rules for all that.

>> That's a tricky one. One idea is to prohibit choosing a different
>> collation than the one in the template database, unless we know it's
>> safe to do so without reindexing.
>
> But that put us back where we started: every database having the same
> collation. We're trying to move away from that. Just reindex everything
> and be done with it.

That's easier said than done, unfortunately.

>> Note that we already have the same problem with encodings. If you create
>> a database with LATIN1 encoding, load it with data, and then use that as
>> a template for a database with UTF-8 encoding, the text data will be
>> incorrectly encoded. We should probably fix that too.
>
> I'd say forbid more than one encoding in a cluster, but that's just my
> opinion :)

Yeah, that's pretty useless, at least without support for different
locales on different databases. But might as well keep it unless there's
a pressing reason to drop it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: WIP patch: Collation support at 2008-09-10 08:48:20 from Martijn van Oosterhout

Responses

Re: WIP patch: Collation support at 2008-09-10 10:23:34 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zdenek Kotala	2008-09-10 10:12:05	Re: WIP patch: Collation support
Previous Message	Markus Wanner	2008-09-10 09:37:05	Re: Synchronous Log Shipping Replication