From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Date: | 2016-03-23 02:33:49 |
Message-ID: | CAM3SWZSzE13i=9pDseTn9XzE21kQ_qHnb7JOkDNUs3akH=jswQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, Mar 22, 2016 at 3:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, if we implement a compatibility GUC that shuts off our
> dependency on strxfrm(), people can go back to having 9.5 be no more
> broken than 9.4 was. I vote we do that and go home.
I don't have a problem with that idea, but I fear "no more broken than
9.4 was" might be a very low bar for certain systems and collations.
Abbreviated key may have simply unmasked the problem in some cases.
Consider:
[vagrant(at)localhost ~]$ LC_COLLATE=en_us sort strings.txt <-- correct
x xx
x xx"
xxx
xxx"
[vagrant(at)localhost ~]$ LC_COLLATE=de_DE sort strings.txt <-- wrong
xxx
xxx"
x xx
x xx"
[vagrant(at)localhost ~]$ ./strxfrm-binary de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6
My concern was not merely "academic" (i.e. it was not limited in scope
to things that don't make B-Tree indexes corrupt). Pretty sure that we
need to start thinking of this as a problem with strcoll() that
strxfrm() does not have for more fundamental reasons, because
strcoll() says that the first string in the de_DE sorted list is
*greater* than the third string. That's wrong, and not just because
strxfrm() gives an intuitively correct answer -- it's wrong
specifically because the transitive law has been broken.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-23 02:41:43 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Previous Message | Stephen Frost | 2016-03-23 01:49:56 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-23 02:41:43 | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Previous Message | David Steele | 2016-03-23 02:11:12 | Re: WAL logging problem in 9.4.3? |