Quick Links

Windows UTF-8, non-ICU collation trouble

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Windows UTF-8, non-ICU collation trouble
Date:	2019-12-06 06:34:01
Message-ID:	20191206063401.GB1629883@rfd.leadboat.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

We use system UTF-16 collation to implement UTF-8 collation on Windows. The
PostgreSQL security team received a report, from Timothy Kuun, that this
collation does not uphold the "symmetric law" and "transitive law" that we
require for btree operator classes. The attached test program demonstrates
this. http://www.delphigroups.info/2/62/478610.html quotes reports of that
problem going back eighteen years. Most code points are unaffected. Indexing
an affected code point using such a collation can cause btree index scans to not
find a row they should find and can make a UNIQUE or PRIMARY KEY constraint
admit a duplicate. The security team determined that this doesn't qualify as a
security vulnerability, but it's still a bug.

All I can think to do is issue a warning whenever a CREATE DATABASE or CREATE
COLLATION combines UTF8 encoding with a locale having this problem. In a
greenfield, I would forbid affected combinations of encoding and locale. That
is too harsh, considering the few code points affected and the difficulty of
changing the collation of existing databases. For CREATE DATABASE, all except
LOCALE=C would trigger the warning. For CREATE COLLATION, ICU locales would
also not trigger the warning. Hence, the chief workaround is to use LOCALE=C at
the database level and ICU collations for indexes and operator invocations.
(The ability to use an ICU collation at the database level would improve the
user experience here.) Better ideas?

Attachment	Content-Type	Size
locale-sort.c	text/plain	1.3 KB

Responses

Re: Windows UTF-8, non-ICU collation trouble at 2019-12-06 06:56:08 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rushabh Lathia	2019-12-06 06:35:19	Re: backup manifests
Previous Message	Tatsuro Yamada	2019-12-06 06:23:58	Re: progress report for ANALYZE