From: | Jeremy Schneider <schneider(at)ardentperf(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Built-in CTYPE provider |
Date: | 2024-07-06 20:37:06 |
Message-ID: | E8754F74-C65F-4A1A-826F-FD9F37599A2E@ardentperf.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
> On Jul 6, 2024, at 12:51 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Behavior after that:
>
> -- 2 rows w/ seq scan, 0 rows w/ index scan
> SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
> SET enable_seqscan = off;
> SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
>
> -- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "iexpr"
> SELECT bt_index_parent_check('iexpr', heapallindexed => true);
> -- ERROR: heap tuple (0,1) from table "t" lacks matching index tuple within index "ipred"
> SELECT bt_index_parent_check('ipred', heapallindexed => true);
Other databases do still ship built-in ancient versions of unicode (Db2 ships 4.0+ and Oracle ships 6.1+), and they have added new Unicode versions alongside the old but not removed the old versions. They claim to have “deprecated” old versions… but it seems they haven’t been able to get rid of them yet. Maybe some customer is willing to pay to continue deferring painful rebuilds needed to get rid of the old collation versions in commercial DBs?
For reference, see the table on slide 56 at https://www.pgevents.ca/events/pgconfdev2024/schedule/session/95-collations-from-a-to-z/ and also see https://ardentperf.com/2024/05/22/default-sort-order-in-db2-sql-server-oracle-postgres-17/
Thanks for the illustration with actual Unicode 16 draft data.
Also, not directly related to this email… but reiterating a point I argued for in the recorded talk at pgconf.dev in Vancouver: a very strong argument for having the DB default to a stable unchanging built-in collation is that the dependency tracking makes it easy to identify objects in the database using non-default collations, and it’s easy to know exactly what needs to be rebuilt for a user to safely change some non-default collation provider’s behavior.
-Jeremy
Sent from my TI-83
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-07-06 22:46:47 | Re: Vectored I/O in bulk_write.c |
Previous Message | Tom Lane | 2024-07-06 20:19:21 | Re: Built-in CTYPE provider |