Re: Collation & ctype method table, and extension hooks

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Collation & ctype method table, and extension hooks
Date: 2024-10-24 08:05:24
Message-ID: 59da7ee4-5e1a-4727-b464-a603c6ed84cd@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/10/24 1:27 AM, Jeff Davis wrote:
> Attached v6 with significant improvements, and should be easier to
> review.
>
> This removes all runtime branching for collation & ctype operations; I
> even removed the "provider" field of pg_locale_t to be sure.

Nice! Some great changes. I did a quick review:

= General

Why is there no pg_locale_builtin.c? I feel the code would be easier to
understand for someone not familiar with it if each provider was defined
in its own file.

= v6-0003-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patch

Looks good.

= v6-0004-Perform-provider-specific-initialization-code-in-.patch

I am not a fan of all the #ifdef USE_ICU in pg_locale_icu.c but I am
not sure if I have a cleaner solution.

= v6-0005-Control-collation-behavior-with-a-method-table.patch

strncoll_libc_win32_utf8 is used in one patch but then later defined in
the next patch. So seems like you accidentally added that to the wrong
patch,

I think adding an assert to create_pg_locale() which enforces valid
there is always a combination of collate_is_c and collate would be good.
Especially when we have the hook.

= v6-0006-Control-case-mapping-behavior-with-a-method-table.patch

I think you forgot to remove #include <wctype.h> from formatting.c.

I need to look at it more in detail but I think this new version makes
us do extra work when ICU strings grow in length when calling upper/lower.

I think adding an assert to create_pg_locale() which enforces valid
there is always a combination of ctype_is_c and casemap would be good,
similar to the collate field.

= v6-0007-Control-ctype-behavior-with-a-method-table.patch

Why are casemap and ctype_methods not the same struct? They seem very
closely related.

This commit makes me tempted to handle the ctype_is_c logic for
character classes also in callbacks and remove the if in functions like
pg_wc_ispunct(). But this si something that would need to be benchmarked.

I wonder if the bitmask idea isn't terrible for the branch predictor and
that me may want one function per character class, but this is yet again
something we need to benchmark.

= v6-0008-Remove-provider-field-from-pg_locale_t.patch

Looks good.

= v6-0009-Make-provider-data-in-pg_locale_t-an-opaque-point.patch

Is there a reason we allocate the icu_provider in create_pg_locale_icu
with MemoryContextAllocZero when we intialize everything anyway? And
similar for other providers.

= v6-0010-Don-t-include-ICU-headers-in-pg_locale.h.patch

Looks good.

= v6-0011-Introduce-hooks-for-creating-custom-pg_locale_t.patch

Looks good but seems like a quite painful API to use.

> * Have a CREATE LOCALE PROVIDER command and make "provider" an Oid
> rather than a char ('b'/'i'/'c'). The v6 patches brings us close to
> this point, but I'm not sure if we want to go this far in v18.

Probably necessary but I hate all the DDL commands the way to SQL
standard is written forces us to add.

> * Need an actual extension to prove that it works.
>
> * Clean up the way versions are handled.
>
> * Do we want to provide support for changing the provider at initdb
> time?

Not sure, need to think about this one.

> * The catalog representation is not very clean or general. The libc
> provider allows collation and ctype to be set separately, but they
> control the environment variables, too. ICU has rules, which are
> specific to ICU.

Yeah, would be really nice to clean this up but it might be work for a
different patch set.

Rebased patches are attached.

Andreas

Attachment Content-Type Size
v1-0001-Specialize-EEOP_-_TESTVAL-steps.patch text/x-patch 9.0 KB
v7-0001-Refactor-the-code-to-create-a-pg_locale_t-into-ne.patch text/x-patch 11.3 KB
v7-0002-Perform-provider-specific-initialization-code-in-.patch text/x-patch 16.6 KB
v7-0003-Control-collation-behavior-with-a-method-table.patch text/x-patch 18.4 KB
v7-0004-Control-case-mapping-behavior-with-a-method-table.patch text/x-patch 35.7 KB
v7-0005-Control-ctype-behavior-with-a-method-table.patch text/x-patch 31.0 KB
v7-0006-Remove-provider-field-from-pg_locale_t.patch text/x-patch 4.7 KB
v7-0007-Make-provider-data-in-pg_locale_t-an-opaque-point.patch text/x-patch 21.6 KB
v7-0008-Don-t-include-ICU-headers-in-pg_locale.h.patch text/x-patch 3.7 KB
v7-0009-Introduce-hooks-for-creating-custom-pg_locale_t.patch text/x-patch 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-10-24 08:29:02 Re: proposal: schema variables
Previous Message Haoran Zhang 2024-10-24 07:15:51 Typo spotted in GetFileBackupMethod comment(?)