From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-06 19:51:29 |
Message-ID: | 20240706195129.fd@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jul 05, 2024 at 02:38:45PM -0700, Jeff Davis wrote:
> On Thu, 2024-07-04 at 14:26 -0700, Noah Misch wrote:
> > I think you're saying that if some Unicode update changes the results
> > of a
> > STABLE function but does not change the result of any IMMUTABLE
> > function, we
> > may as well import that update. Is that about right? If so, I
> > agree.
>
> If you are proposing that Unicode updates should not be performed if
> they affect the results of any IMMUTABLE function, then that's a new
> policy.
>
> For instance, the results of NORMALIZE() changed from PG15 to PG16 due
> to commit 1091b48cd7:
>
> SELECT NORMALIZE(U&'\+01E030',nfkc)::bytea;
>
> Version 15: \xf09e80b0
>
> Version 16: \xd0b0
As a released feature, NORMALIZE() has a different set of remedies to choose
from, and I'm not proposing one. I may have sidetracked this thread by
talking about remedies without an agreement that pg_c_utf8 has a problem. My
question for the PostgreSQL maintainers is this:
textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
IMMUTABLE, will change behavior in some major releases. pg_upgrade does not
have a concept of IMMUTABLE functions changing, so index scans will return
wrong query results after upgrade. Is it okay for v17 to release a
pg_c_utf8 planned to behave that way when upgrading v17 to v18+?
If the answer is yes, the open item closes. If the answer is no, determining
the remedy can come next.
Lest concrete details help anyone reading, here are some affected objects:
CREATE TABLE t (s text COLLATE pg_c_utf8);
INSERT INTO t VALUES (U&'\+00a7dc'), (U&'\+001dd3');
CREATE INDEX iexpr ON t ((lower(s)));
CREATE INDEX ipred ON t (s) WHERE s ~ '[[:alpha:]]';
v17 can simulate the Unicode aspect of a v18 upgrade, like this:
sed -i 's/^UNICODE_VERSION.*/UNICODE_VERSION = 16.0.0/' src/Makefile.global.in
# ignore test failures (your ICU likely doesn't have the Unicode 16.0.0 draft)
make -C src/common/unicode update-unicode
make
make install
pg_ctl restart
Behavior after that:
-- 2 rows w/ seq scan, 0 rows w/ index scan
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
SET enable_seqscan = off;
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
-- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "iexpr"
SELECT bt_index_parent_check('iexpr', heapallindexed => true);
-- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "ipred"
SELECT bt_index_parent_check('ipred', heapallindexed => true);
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-07-06 20:19:21 | Re: Built-in CTYPE provider |
Previous Message | Tom Lane | 2024-07-06 19:03:01 | Re: XML test error on Arch Linux |