Re: Built-in CTYPE provider

From: Noah Misch <noah(at)leadboat(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-07-06 19:51:29
Message-ID: 20240706195129.fd@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 05, 2024 at 02:38:45PM -0700, Jeff Davis wrote:
> On Thu, 2024-07-04 at 14:26 -0700, Noah Misch wrote:
> > I think you're saying that if some Unicode update changes the results
> > of a
> > STABLE function but does not change the result of any IMMUTABLE
> > function, we
> > may as well import that update.  Is that about right?  If so, I
> > agree.
>
> If you are proposing that Unicode updates should not be performed if
> they affect the results of any IMMUTABLE function, then that's a new
> policy.
>
> For instance, the results of NORMALIZE() changed from PG15 to PG16 due
> to commit 1091b48cd7:
>
> SELECT NORMALIZE(U&'\+01E030',nfkc)::bytea;
>
> Version 15: \xf09e80b0
>
> Version 16: \xd0b0

As a released feature, NORMALIZE() has a different set of remedies to choose
from, and I'm not proposing one. I may have sidetracked this thread by
talking about remedies without an agreement that pg_c_utf8 has a problem. My
question for the PostgreSQL maintainers is this:

textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
IMMUTABLE, will change behavior in some major releases. pg_upgrade does not
have a concept of IMMUTABLE functions changing, so index scans will return
wrong query results after upgrade. Is it okay for v17 to release a
pg_c_utf8 planned to behave that way when upgrading v17 to v18+?

If the answer is yes, the open item closes. If the answer is no, determining
the remedy can come next.

Lest concrete details help anyone reading, here are some affected objects:

CREATE TABLE t (s text COLLATE pg_c_utf8);
INSERT INTO t VALUES (U&'\+00a7dc'), (U&'\+001dd3');
CREATE INDEX iexpr ON t ((lower(s)));
CREATE INDEX ipred ON t (s) WHERE s ~ '[[:alpha:]]';

v17 can simulate the Unicode aspect of a v18 upgrade, like this:

sed -i 's/^UNICODE_VERSION.*/UNICODE_VERSION = 16.0.0/' src/Makefile.global.in
# ignore test failures (your ICU likely doesn't have the Unicode 16.0.0 draft)
make -C src/common/unicode update-unicode
make
make install
pg_ctl restart

Behavior after that:

-- 2 rows w/ seq scan, 0 rows w/ index scan
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
SET enable_seqscan = off;
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';

-- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "iexpr"
SELECT bt_index_parent_check('iexpr', heapallindexed => true);
-- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "ipred"
SELECT bt_index_parent_check('ipred', heapallindexed => true);

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-06 20:19:21 Re: Built-in CTYPE provider
Previous Message Tom Lane 2024-07-06 19:03:01 Re: XML test error on Arch Linux