From: | Peter Eisentraut <peter(at)eisentraut(dot)org> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | fixing tsearch locale support |
Date: | 2024-12-02 10:57:05 |
Message-ID: | 653f3b84-fc87-45a7-9a0c-bfb4fcab3e7d@eisentraut.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Infamously, the tsearch locale support in
src/backend/tsearch/ts_locale.c still depends on libc environment
variable locale settings and is not caught up with pg_locale_t,
collations, ICU, and all that newer stuff. This is used in the tsearch
facilities themselves, but also in other modules such as ltree, pg_trgm,
and unaccent.
Several of the functions are wrappers around <ctype.h> functions, like
int
t_isalpha(const char *ptr)
{
int clen = pg_mblen(ptr);
wchar_t character[WC_BUF_LEN];
pg_locale_t mylocale = 0; /* TODO */
if (clen == 1 || database_ctype_is_c)
return isalpha(TOUCHAR(ptr));
char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);
return iswalpha((wint_t) character[0]);
}
So this has multibyte and encoding awareness, but does not observe
locale provider or collation settings.
As an easy start toward fixing this, I think several of these functions
we don't even need.
t_isdigit() and t_isspace() are just used to parse various configuration
and data files, and surely we don't need support for encoding-dependent
multibyte support for parsing ASCII digits and ASCII spaces. At least,
I didn't find any indications in the documentation of these file formats
that they are supposed to support that kind of thing. So these can be
replaced by the normal isdigit() and isspace().
There is one call to t_isprint(), which is similarly used only to parse
some flags in a configuration file. From the surrounding code you can
deduce that it's only called on single-byte characters, so it can
similarly be replaced by plain issprint().
Note, pg_trgm has some compile-time options with macros such as
KEEPONLYALNUM and IGNORECASE. AFAICT, these are not documented, and the
non-default variant is not supported by any test cases. So as part of
this undertaking, I'm going to remove the non-default variants if they
are in the way of cleanup.
Attachment | Content-Type | Size |
---|---|---|
0001-Remove-t_isdigit.patch | text/plain | 4.8 KB |
0002-Remove-t_isspace.patch | text/plain | 16.1 KB |
0003-Remove-t_isprint.patch | text/plain | 2.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ilia Evdokimov | 2024-12-02 11:06:30 | Re: Sample rate added to pg_stat_statements |
Previous Message | Nazir Bilal Yavuz | 2024-12-02 10:50:57 | Re: meson missing test dependencies |