From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Marc-Olaf Jaschke <marc-olaf(dot)jaschke(at)s24(dot)com>, Postgres-Bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) |
Date: | 2016-03-22 21:09:52 |
Message-ID: | CAM3SWZTums8PyGAHUpj=3gpt-xiGbbLqUun3mKK8-aTh-z9R2A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Can you look at generating a textual representation of the strxfrm()
> blobs in question, using Robert's tool?:
>
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
I played with this tool myself, on an affected CentOS 6.7 VM:
[vagrant(at)localhost ~]$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
I now think that we have this backwards: This isn't a bug in glibc's
strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
modified tool, simplified to use ascii-safe strings:
[vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: 6
If we assume for the sake of argument that this is a strxfrm() bug and
strcoll() is a reliable source of truth, then I find it very curious
that Germany's Austrian neighbors differ on this point about how text
should be collated:
[vagrant(at)localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
"xxx" -> 2323230108080801020202 (11 bytes)
"x xx" -> 2323230108080801020202010235 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1
This surely adds doubt to the idea that strxfrm() in particular is broken.
I find something else inconsistent with the strxfrm() theory: even the
de_DE collation gives strxfrm()/strcoll() self-consistent answers when
we move the rhs argument's space to the far side of its center 'x'
char:
[vagrant(at)localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
"xxx" -> 2323230108080801020202 (11 bytes)
"xx x" -> 2323230108080801020202010335 (14 bytes)
strcmp(arg1, arg2) result: -1
strcoll(arg1, arg2) result: -1
It seems very unlikely that this is because of a legitimate
consideration that strcoll() makes about how German should be collated
(one that strxfrm() fails to make, say).
This is probably a worse situation for affected Postgres systems,
though, because now they have no scope to turn the faulty part of the
system off. I have a hard time believing that it's a good idea to
trust strcoll() to be wrong in a consistent way that has collatable
type opclasses at least follow "Notes to Operator Class Implementors".
I'd like to hear more opinions on that, though, because it's a tricky
thing to reason about.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-03-22 21:57:22 | Re: BUG #14034: Select for update with inner select doesn't return value after committing by other transaction. |
Previous Message | Alvaro Herrera | 2016-03-22 21:07:56 | Re: problem in sql - sum() |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-03-22 21:10:11 | Re: WIP: Access method extendability |
Previous Message | Yury Zhuravlev | 2016-03-22 21:04:49 | Re: NOT EXIST for PREPARE |