From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | phede-ml(at)islande(dot)org |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unicode combining characters |
Date: | 2001-09-25 00:56:36 |
Message-ID: | 20010925095636E.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> So, this shows two problems :
>
> - length() on the server side doesn't handle correctly Unicode [I have
> the same result with char_length()], and returns the number of chars
> (as it is however advertised to do), rather the length of the
> string.
This is a known limitation.
> - the psql frontend makes the same mistake.
>
> I am using version 7.1.3 (debian sid), so it may have been corrected
> in the meantime (in this case, I apologise, but I have only recently
> started again to use PostgreSQL and I haven't followed -hackers long
> enough).
>
>
> => I think fixing psql shouldn't be too complicated, as the glibc
> should be providing the locale, and return the right values (is this
> the case ? and what happens for combined latin + chinese characters
> for example ? I'll have to try that later). If it's not fixed already,
> do you want me to look at this ? [it will take some time, as I haven't
> set up any development environment for postgres yet, and I'm away for
> one week from thursday].
Sounds great.
> I was wondering if some people have already thought about this, or
> already done something, or if some of you are interested in this. If
> nobody does anything, I'll do something eventually, probably before
> Christmas (I don't have much time for this, and I don't need the
> functionality right now), but if there is an interest, I could team
> with others and develop it faster :)
I'm very interested in your point. I will start studying [1][2] after
the beta freeze.
> Anyway, I'm open to suggestions :
>
> - implement it in C, in the core,
>
> - implement it in C, as contributed custom functions,
This may be a good starting point.
> I can't really accept a solution which would rely on the underlaying
> libc, as it may not provide the necessary locales (or maybe, then,
I totally agree here.
> The main functions I foresee are :
>
> - provide a normalisation function to all 4 forms,
>
> - provide a collation_key(text, language) function, as the calculation
> of the key may be expensive, some may want to index on the result (I
> would :) ),
>
> - provide a collation algorithm, using the two previous facilities,
> which can do primary to tertiary collation (cf TR#10 for a detailed
> explanation).
>
> I haven't looked at PostgreSQL code yet (shame !), so I may be
> completely off-track, in which case I'll retract myself and won't
> bother you again (on that subject, that is ;) )...
>
> Comments ?
--
Tatsuo Ishii
From | Date | Subject | |
---|---|---|---|
Next Message | mlw | 2001-09-25 01:25:10 | Re: Changing data types |
Previous Message | Hiroshi Inoue | 2001-09-25 00:41:27 | Re: [HACKERS] UTF-8 support |