Re: Built-in CTYPE provider

From: Jeremy Schneider <schneider(at)ardentperf(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-01-09 22:17:44
Message-ID: 2b98e5d9-6c51-41cf-8017-88a6a6a129bc@ardentperf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/28/23 6:57 PM, Jeff Davis wrote:
> On Wed, 2023-12-27 at 17:26 -0800, Jeff Davis wrote:
> Attached a more complete version that fixes a few bugs, stabilizes the
> tests, and improves the documentation. I optimized the performance, too
> -- now it's beating both libc's "C.utf8" and ICU "en-US-x-icu" for both
> collation and case mapping (numbers below).
>
> It's really nice to finally be able to have platform-independent tests
> that work on any UTF-8 database.

I think we missed something in psql, pretty sure I applied all the
patches but I see this error:

=# \l
ERROR: 42703: column d.datlocale does not exist
LINE 8: d.datlocale as "Locale",
^
HINT: Perhaps you meant to reference the column "d.daticulocale".
LOCATION: errorMissingColumn, parse_relation.c:3720

=====

This is interesting. Jeff your original email didn't explicitly show any
other initcap() results, but on Ubuntu 22.04 (glibc 2.35) I see
different results:

=# SELECT initcap('axxE áxxÉ DŽxxDŽ Džxxx džxxx');
initcap
--------------------------
Axxe Áxxé DŽxxdž DŽxxx DŽxxx

=# SELECT initcap('axxE áxxÉ DŽxxDŽ Džxxx džxxx' COLLATE C_UTF8);
initcap
--------------------------
Axxe Áxxé Džxxdž Džxxx Džxxx

The COLLATE sql syntax feels awkward to me. In this example, we're just
using it to attach locale info to the string, and there's not actually
any collation involved here. Not sure if COLLATE comes from the
standard, and even if it does I'm not sure whether the standard had
upper/lowercase in mind.

That said, I think the thing that mainly matters will be the CREATE
DATABASE syntax and the database default.

I want to try a few things with table-level defaults that differ from
database-level defaults, especially table-level ICU defaults because I
think a number of PostgreSQL users set that up in the years before we
supported DB-level ICU. Some people will probably keep using their
old/existing schema-creation scripts even after they begin provisioning
new systems with new database-level defaults.

-Jeremy

--
http://about.me/jeremy_schneider

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-01-09 22:18:36 Re: Commitfest 2024-01 first week update
Previous Message Robert Haas 2024-01-09 22:15:10 Re: Add BF member koel-like indentation checks to SanityCheck CI