From: | Jeremy Schneider <schneider(at)ardentperf(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-01-09 22:17:44 |
Message-ID: | 2b98e5d9-6c51-41cf-8017-88a6a6a129bc@ardentperf.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/28/23 6:57 PM, Jeff Davis wrote:
> On Wed, 2023-12-27 at 17:26 -0800, Jeff Davis wrote:
> Attached a more complete version that fixes a few bugs, stabilizes the
> tests, and improves the documentation. I optimized the performance, too
> -- now it's beating both libc's "C.utf8" and ICU "en-US-x-icu" for both
> collation and case mapping (numbers below).
>
> It's really nice to finally be able to have platform-independent tests
> that work on any UTF-8 database.
I think we missed something in psql, pretty sure I applied all the
patches but I see this error:
=# \l
ERROR: 42703: column d.datlocale does not exist
LINE 8: d.datlocale as "Locale",
^
HINT: Perhaps you meant to reference the column "d.daticulocale".
LOCATION: errorMissingColumn, parse_relation.c:3720
=====
This is interesting. Jeff your original email didn't explicitly show any
other initcap() results, but on Ubuntu 22.04 (glibc 2.35) I see
different results:
=# SELECT initcap('axxE áxxÉ DŽxxDŽ Džxxx džxxx');
initcap
--------------------------
Axxe Áxxé DŽxxdž DŽxxx DŽxxx
=# SELECT initcap('axxE áxxÉ DŽxxDŽ Džxxx džxxx' COLLATE C_UTF8);
initcap
--------------------------
Axxe Áxxé Džxxdž Džxxx Džxxx
The COLLATE sql syntax feels awkward to me. In this example, we're just
using it to attach locale info to the string, and there's not actually
any collation involved here. Not sure if COLLATE comes from the
standard, and even if it does I'm not sure whether the standard had
upper/lowercase in mind.
That said, I think the thing that mainly matters will be the CREATE
DATABASE syntax and the database default.
I want to try a few things with table-level defaults that differ from
database-level defaults, especially table-level ICU defaults because I
think a number of PostgreSQL users set that up in the years before we
supported DB-level ICU. Some people will probably keep using their
old/existing schema-creation scripts even after they begin provisioning
new systems with new database-level defaults.
-Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-01-09 22:18:36 | Re: Commitfest 2024-01 first week update |
Previous Message | Robert Haas | 2024-01-09 22:15:10 | Re: Add BF member koel-like indentation checks to SanityCheck CI |