From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Greg Stark <gsstark(at)MIT(dot)EDU> |
Cc: | Dennis Gearon <gearond(at)fireserve(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Collation rules and multi-lingual databases |
Date: | 2003-08-22 15:43:19 |
Message-ID: | 87ekzditpk.fsf@stark.dyndns.tv |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Greg Stark <gsstark(at)MIT(dot)EDU> writes:
> Dennis Gearon <gearond(at)fireserve(dot)net> writes:
>
> > I think it would be nice, and I may write it eventually, to have a function
> > called:
> >
> > COLLATION_VALUE( 'string', 'encoding' )
>
> Indeed that would be really nice. I wish I had that and a pony.
>
> Unfortunately my understanding is that the collation rules are simply too
> complex to allow such a function in general. It's too bad because it would
> indeed eliminate a lot of the problems in a single swoop.
Uh, so apparently I'm on crack and this is *precisely* how the l10n collation
rules work. Sorry for jumping in with an uninformed opinion.
> Effectively, the way these functions work is by applying a mapping to
> transform the characters in a string to a byte sequence that represents
> the string's position in the collating sequence of the current locale.
> Comparing two such byte sequences in a simple fashion is equivalent to
> comparing the strings with the locale's collating sequence.
>
> The functions `strcoll' and `wcscoll' perform this translation
> implicitly, in order to do one comparison. By contrast, `strxfrm' and
> `wcsxfrm' perform the mapping explicitly. If you are making multiple
> comparisons using the same string or set of strings, it is likely to be
> more efficient to use `strxfrm' or `wcsxfrm' to transform all the
> strings just once, and subsequently compare the transformed strings
> with `strcmp' or `wcscmp'.
Given this it should be easy to write a collation_value(string,locale) C
function that switches the collation order, calls strxfrm and then restores
the collation order.
I fear memory leaks or performance losses on frequent locale switches like
this but it should be easy enough to try out. I don't see any problems with
postgres as long as it's possible to ensure the locale is always switched back
properly. It might not be thread-safe though.
At worst I could always call strxfrm in the application for each locale I care
about when inserting the data. That would bloat my tables for nothing though.
So it's looking like I might get my pony after all.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Shridhar Daithankar | 2003-08-22 15:47:04 | Re: Buglist |
Previous Message | Claudio Lapidus | 2003-08-22 15:35:50 | Re: Buglist |