Re: Locale implementation questions

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, gsstark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Locale implementation questions
Date: 2005-09-04 15:01:13
Message-ID: 20050904150055.GB21198@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 04, 2005 at 10:25:36PM +0900, Tatsuo Ishii wrote:
> > 3. Compiled locale files are large. One UTF-8 locale datafile can
> > exceed a megabyte. Do we want the option of disabling it for small
> > systems?
>
> To avoid the problem, you could dynmically load the compiled
> tables. The charset conversion tables are handled similar way.

That's not the point, ofcourse they are loaded dynamically. The
question is, when do we create the files in the first place. There are
48*15 = 750 combinations which would amount to tens of megabytes of
essentially useless data. *When* you create the files is an important
question. Compile time is out.

Charset conversion is completely different, there just arn't that many
combinations.

> Also I think it's important to allow user defined collate data. To
> implement the CREATE COLLATE syntax, we need to have that capability
> anyway.

Most OS's allow you to create collate data yourself anyway, why do we
need to implement this too?

> To be honest, I don't understand why we have to rely on (often broken)
> system locales. I don't think building our own locale data is too
> hard, and once we make up it, the maintenace cost will be very small
> since it should not be changed regularly. Moreover we could enjoy the
> benefit that PostgreSQL handles collations in a corret manner on any
> platform which PostgreSQL supports.

You say building our own locale data is not hard. I disagree, it's a
waste of time we can do without. Unless you know the language yourself
you cannot check changes made by anybody else. If there's an error in
locale ordering, take it up with your OS distributor.

I also think we open ourselves to questions like:

1. My locale is supported by the system but not by PostgreSQL, why?
2. My locale was supported last release but not this one, why?
3. Why does PostgreSQL sort differently from 'sort' or any other app on
my system?

> Right. We Japanese (and probably Chinese too) have been bugged by the
> broken mutibyte locales for long time. Using C locale help us to a
> certain extent, but for Unicode we need correct locale data, othewise
> the sorted data will be completely chaos.

Ok, is glibc still wrong or are they just implementing the unicode
standard and that's what's wrong.

All I'm saying is that we need to allow use of system locales until our
native locale support is mature. In the end something like ICU
(http://icu.sourceforge.net/) will end up obsoleting us. Nobody (in
free-software anyway) uses it yet, but eventually it may be viable to
require that to allow system independant locales.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2005-09-04 17:01:22 Re: Call for 7.5 feature completion
Previous Message Tatsuo Ishii 2005-09-04 13:25:36 Re: Locale implementation questions