From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-03-14 20:42:20 |
Message-ID: | 7451f81ba0cb512222ab759de8ca1cffe44e9acb.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2024-03-14 at 09:54 +0100, Peter Eisentraut wrote:
> * doc/src/sgml/charset.sgml
>
> I don't understand the purpose of this sentence:
>
> "When using this locale, the behavior may depend on the database
> encoding."
The "C" locale (in either the builtin or libc provider) can sort
differently in different encodings, because it's based on memcmp. For
instance:
select U&'\20AC' > U&'\201A' collate "C";
Returns true in UTF-8 and false in WIN1252. That's why UCS_BASIC is
only available in UTF-8, because (at least for some encodings) we'd
have to decode before comparison to get the code-point-order semantics
right.
In other words, the "C" collation is not a well-defined order, but
UCS_BASIC and C.UTF-8 are well-defined.
Suggestions for better wording are welcome.
> * doc/src/sgml/ref/create_database.sgml
>
> The new parameter builtin_locale is not documented.
Thank you, fixed in 0001 (review fixup).
> * src/backend/commands/collationcmds.c
>
> I think DefineCollation() should set collencoding = -1 for the
> COLLPROVIDER_BUILTIN case. -1 stands for any encoding. Or at least
> explain why not?
In the attached v25-0001 (review fixup) I have made it the
responsibility of a function, and then extended that for the C.UTF-8
(0002) and PG_UNICODE_FAST locales (0007).
> * src/backend/utils/adt/pg_locale.c
>
> This part is a bit confusing:
>
> + cache_entry->collate_is_c = true;
> + cache_entry->ctype_is_c = (strcmp(colllocale, "C") == 0);
>
> Is collate always C but ctype only sometimes? Does this anticipate
> future patches in this series? Maybe in this patch it should always
> be true?
Made it a constant in v25-0001, and changed it in 0002
>
> * src/bin/initdb/initdb.c
>
> + printf(_(" --builtin-locale=LOCALE set builtin locale name
> for new databases\n"));
>
> Put in a line break so that the right "column" lines up.
Fixed in 0001
> This output should line up better:
>
> The database cluster will be initialized with this locale
> configuration:
> default collation provider: icu
> default collation locale: en
> LC_COLLATE: C
> LC_CTYPE: C
> ...
>
> Also, why are there two spaces after "provider: "?
>
> Also we call these locale provider on input, why are they collation
> providers on output? What is a "collation locale"?
I tried to fix these things in 0001.
> * src/bin/pg_upgrade/t/002_pg_upgrade.pl
>
> +if ($oldnode->pg_version >= '17devel')
>
> This is weird. >= is a numeric comparison, so providing a string
> with
> non-digits is misleading at best.
It's actually not a numeric comparison, it's an overloaded comparison
op for the Version class.
See 32dd2c1eff and:
https://www.postgresql.org/message-id/1738174.1710274577%40sss.pgh.pa.us
> * src/test/icu/t/010_database.pl
>
> -# Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE
> -# are specified
>
> Why remove this test?
It must have been lost during a rebase, fixed in 0001.
> Change the name of the new database to be different from the name of
> the template database.
Fixed in 0001.
New series attached.
Regards,
Jeff Davis
Attachment | Content-Type | Size |
---|---|---|
v25-0001-Address-more-review-comments-on-commit-2d819a08a.patch | text/x-patch | 7.9 KB |
v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch | text/x-patch | 29.8 KB |
v25-0003-Inline-basic-UTF-8-functions.patch | text/x-patch | 4.4 KB |
v25-0004-Use-version-for-builtin-collations.patch | text/x-patch | 1.8 KB |
v25-0005-Add-unicode_strtitle-for-Unicode-Default-Case-Co.patch | text/x-patch | 9.4 KB |
v25-0006-Support-Unicode-full-case-mapping-and-conversion.patch | text/x-patch | 555.7 KB |
v25-0007-Support-PG_UNICODE_FAST-locale-in-the-builtin-co.patch | text/x-patch | 18.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-03-14 20:42:28 | Re: Built-in CTYPE provider |
Previous Message | David Steele | 2024-03-14 20:40:38 | Re: Add basic tests for the low-level backup method. |