From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Greg Stark <gsstark(at)mit(dot)edu> |
Cc: | John Seberg <johnseberg(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Duplicate Values or Not?! |
Date: | 2005-09-17 22:00:21 |
Message-ID: | 9533.1126994421@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> If that does change the results, it indicates you've got strings which
>> are bytewise different but compare equal according to strcoll(). We've
>> seen this and other misbehaviors from some locale definitions when faced
>> with data that is invalid per the encoding the locale expects.
> There are plenty of non-bytewise-identical strings that do legitimately
> compare equal in various locales. Does the hash code hash strxfrm or the
> original bytes?
I think you are jumping to conclusions. I have not yet seen it
demonstrated that any locale definition in use in-the-wild intends to
compare nonidentical strings as equal. On the other hand, we have seen
plenty of cases of strcoll simply failing (delivering results that are
not even self-consistent) when faced with data it considers invalid.
I notice that the SUS permits strcoll to set errno if given invalid
data:
http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
We are not currently checking for that, but probably we should be.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2005-09-17 22:04:47 | Re: Duplicate Values or Not?! |
Previous Message | Greg Stark | 2005-09-17 19:49:24 | Re: Duplicate Values or Not?! |