From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2023-12-21 22:24:01 |
Message-ID: | 7774b3a64f51b3375060c29871cf2b02b3e85dab.camel@j-davis.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 2023-12-20 at 16:29 -0800, Jeremy Schneider wrote:
> found some more. here's my running list of everything user-facing I
> see
> in core PG code so far that might involve case:
>
> * upper/lower/initcap
> * regexp_*() and *_REGEXP()
> * ILIKE, operators ~* !~* ~~ !~~ ~~* !~~*
> * citext + replace(), split_part(), strpos() and translate()
> * full text search - everything is case folded
> * unaccent? not clear to me whether CTYPE includes accent folding
No, ctype has nothing to do with accents as far as I can tell. I don't
know if I'm using the right terminology, but I think "case" is a
variant of a character whereas "accent" is a modifier/mark, and the
mark is a separate concept from the character itself.
> * ltree
> * pg_trgm
> * core PG parser, case folding of relation names
Let's separate it into groups.
(1) Callers that use a collation OID or pg_locale_t:
* collation & hashing
* upper/lower/initcap
* regex, LIKE, formatting
* pg_trgm (which uses regexes)
* maybe postgres_fdw, but might just be a passthrough
* catalog cache (always uses DEFAULT_COLLATION_OID)
* citext (always uses DEFAULT_COLLATION_OID, but probably shouldn't)
(2) A long tail of callers that depend on what LC_CTYPE/LC_COLLATE are
set to, or use ad-hoc ASCII-only semantics:
* core SQL parser downcase_identifier()
* callers of pg_strcasecmp() (DDL, etc.)
* GUC name case folding
* full text search ("mylocale = 0 /* TODO */")
* a ton of stuff uses isspace(), isdigit(), etc.
* various callers of tolower()/toupper()
* some selfuncs.c stuff
* ...
Might have missed some places.
The user impact of a new builtin provider would affect (1), but only
for those actually using the provider. So there's no compatibility risk
there, but it's good to understand what it will affect.
We can, on a case-by-case basis, also consider using the new APIs I'm
proposing for instances of (2). There would be some compatibility risk
there for existing callers, and we'd have to consider whether it's
worth it or not. Ideally, new callers would either use the new APIs or
use the pg_ascii_* APIs.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2023-12-21 23:00:26 | Re: Built-in CTYPE provider |
Previous Message | Thomas Munro | 2023-12-21 22:05:14 | Re: pg_serial bloat |