Re: Duplicate Values or Not?!

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: John Seberg <johnseberg(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Duplicate Values or Not?!
Date: 2005-09-17 22:00:21
Message-ID: 9533.1126994421@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> If that does change the results, it indicates you've got strings which
>> are bytewise different but compare equal according to strcoll(). We've
>> seen this and other misbehaviors from some locale definitions when faced
>> with data that is invalid per the encoding the locale expects.

> There are plenty of non-bytewise-identical strings that do legitimately
> compare equal in various locales. Does the hash code hash strxfrm or the
> original bytes?

I think you are jumping to conclusions. I have not yet seen it
demonstrated that any locale definition in use in-the-wild intends to
compare nonidentical strings as equal. On the other hand, we have seen
plenty of cases of strcoll simply failing (delivering results that are
not even self-consistent) when faced with data it considers invalid.

I notice that the SUS permits strcoll to set errno if given invalid
data:
http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
We are not currently checking for that, but probably we should be.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-09-17 22:04:47 Re: Duplicate Values or Not?!
Previous Message Greg Stark 2005-09-17 19:49:24 Re: Duplicate Values or Not?!