From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | Greg Stark <gsstark(at)MIT(dot)EDU> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Seberg <johnseberg(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Duplicate Values or Not?! |
Date: | 2005-09-17 17:13:50 |
Message-ID: | 20050917171348.GA11697@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sat, Sep 17, 2005 at 11:50:44AM -0400, Greg Stark wrote:
> Hm. Some experimentation shows that at least on glibc's locale definitions the
> strings that I thought compared equal don't actually compare equal.
> Capitalization, punctuation, white space, while they're basically ignored in
> general in non-C locales do seem to compare non-equal when they're the only
> differentiating factor.
>
> Is this guaranteed by any spec? Or is counting on this behaviour unsafe?
I don't know if it's guarenteed by spec, but it certainly seems silly
for strings to compare equal when they're not. Just because a locale
sorts ignoring case doesn't mean that "sun" and "Sun" are the same. The
only real sensible rule is that strcoll should return 0 only if strcmp
would also return zero...
If you actually use strxfrm on glibc you'll see the result comes out
aprroximatly twice as long. The first n bytes being sortof case-folded
versions of the original characters, the second n characters being some
kind of class identification.
I think that all the spec guarentees is that strcoll(a,b) ==
strcmp(strxfrm(a),strxfrm(b)). If strcoll is returning zero for two
non-identical strings, they must strxfrm to the same thing, so that may
be a solution.
Anyway, long term the plan is to move to a cross-platform locale
library so hopefully broken locale libraries will be a thing of the
pasy...
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Nolan | 2005-09-17 17:45:17 | Re: Duplicate Values or Not?! |
Previous Message | Greg Stark | 2005-09-17 15:50:44 | Re: Duplicate Values or Not?! |