Re: Duplicate Values or Not?!

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Mike Nolan <nolan(at)gw(dot)tssi(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, John Seberg <johnseberg(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Duplicate Values or Not?!
Date: 2005-09-17 19:49:24
Message-ID: 877jdfqxtn.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> On Sat, Sep 17, 2005 at 12:45:17PM -0500, Mike Nolan wrote:
> > > I don't know if it's guarenteed by spec, but it certainly seems silly
> > > for strings to compare equal when they're not. Just because a locale
> > > sorts ignoring case doesn't mean that "sun" and "Sun" are the same. The
> > > only real sensible rule is that strcoll should return 0 only if strcmp
> > > would also return zero...
> >
> > I disagree. Someone who wants true case independence (for whatever reason)
> > needs all aspects of uniqueness such as selects, indexes and groups
> > treating data the same way.
> >
> > This needs to be something the person who creates the instance or the
> > database can control.
>
> Such people need to be looking at citext [1]. My point is that the
> *locale* should not be case-insensetive that way. Consider that if the
> locale treats "sun" and "Sun" identically, then I can't have
> case-sensetivity if I want it. If they are treated differently, I can
> build case-insensetivity on top of it.

Well, consider the case of a two different Unicode encoded strings that
actually represent the same series of characters. They may be byte-wise
different but there's really no difference at all in the text they contain.

That's a bit different from a collation order that specifies two different
character strings that compare equal. But it would suffer from the same
problem.

Nonetheless, I may agree with you that the world would be a better place if
collation orders never created this situation. But unless we can point to some
spec or some solid reason why if that ever happened it would cause worse
headaches than this I think it's necessary to protect the hashing function
from being out of sync with the btree operators.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-09-17 22:00:21 Re: Duplicate Values or Not?!
Previous Message Martijn van Oosterhout 2005-09-17 18:14:08 Re: Duplicate Values or Not?!