From: | Peter Geoghegan <peter(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: sortsupport for text |
Date: | 2012-06-17 16:43:17 |
Message-ID: | CAEYLb_XpBGAUA=UJ=W2rK8eyMBYYAwBbqy0Q7W-ziMb+qwn1-w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 17 June 2012 17:01, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not sure I agree with this decision; why should we presume to know
>> better than the glibc locale what constitutes equality?
>
> The killer reason why it must be like that is that you can't use hash
> methods on text if text equality is some unknown condition subtly
> different from bitwise equality.
Fair enough, but I doubt that we need to revert the changes made in
this commit to texteq in addition to the changes I'd like to see in
order to be semantically self-consistent. That is because there is
often a distinction made between equality and equivalence, and we
could adopt this distinction. strcoll() could be said to be just
making a representation that its two arguments are equivalent (and not
necessarily equal) when it returns 0. This distinction is explicitly
made in the C++ standard library, and failing to understand it can
result in bugs:
http://www.cplusplus.com/reference/algorithm/equal_range/
Note the use of the word "equivalent" rather than "equal" in the text.
equal_range is a bit of a misnomer.
This distinction is important enough to have warranted an entire
subsection of the book "Effective STL" by Scott Meyers, a
well-respected expert on the language. This comes up more often than
you'd think - "std::set::insert" determines if an element already
exists (to know if it must replace it) based on equivalency (usually,
though not necessarily, defined in terms of operator< ), whereas the
"find" algorithm finds elements based on equality (operator==).
> My recollection is that there were
> some other problems as well, but I'm too lazy to search the archives
> for you.
Fair enough. I'll search for it myself later. I'm about to head out now.
>> It's seems very likely that the main
>> one was the then-need to guard against poor quality qsort()
>> implementations that went quadratic in the face of lots of duplicates,
>
> No, I don't recall that that had anything to do with it.
Oh, okay. It looked very much like the "avoid equality at all costs"
thing you still see some of in tuplesort.c .
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira | 2012-06-17 16:49:04 | Re: libpq compression |
Previous Message | Tom Lane | 2012-06-17 16:29:53 | Re: libpq compression |