Re: Unicode FFFF Special Codepoint should always collate high.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Telford Tendys <psql(at)lnx-bsp(dot)net>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Unicode FFFF Special Codepoint should always collate high.
Date: 2021-06-23 06:18:39
Message-ID: CA+hUKGJF0C0_rch1emO7id0QxDNa2s0pPtQCVjuTPM3mqLs+hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jun 23, 2021 at 3:00 PM Telford Tendys <psql(at)lnx-bsp(dot)net> wrote:
> Thank you for taking a look at it, you seem to have confirmed that
> this is coming from the system itself. Yes, my purpose is to do
> prefix searching on strings by specifying a range and taking advantage
> of a B-Tree index, exactly as described in the quote above.

Just in case you didn't know, PostgreSQL knows how to convert a prefix
search for 'aar%' into a range search with a related trick in some
limited circumstances:

tmunro=> create table t (name text);
CREATE TABLE
tmunro=> insert into t values ('aardvark'), ('buffalo'), ('cat');
INSERT 0 3
tmunro=> create index on t(name text_pattern_ops);
CREATE INDEX
tmunro=> analyze t;
ANALYZE
tmunro=> explain select * from t where name like 'aar%';
QUERY PLAN
-------------------------------------------------------------------------
Index Only Scan using t_name_idx on t (cost=0.13..8.15 rows=1 width=7)
Index Cond: ((name ~>=~ 'aar'::text) AND (name ~<~ 'aas'::text))
Filter: (name ~~ 'aar%'::text)
(3 rows)

> https://bugzilla.redhat.com/show_bug.cgi?id=1975045

Seems a little light on references and justifications for the
expectiation. ICU's results could be useful in the discussion. It's
interesting that more recent Unicode versions removed the prohibition
on using these code points at all (some earlier version said that if
your string contained them, they weren't Unicode, apparently,
according to my quick read of
https://en.wikipedia.org/wiki/Specials_(Unicode_block) but I didn't
have time to look into any source documents).

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Vladimir Shvartsgor 2021-06-23 06:40:32 Example in "42.8. Transaction Management" doesn't work for PostgreSQL v 12.7
Previous Message Telford Tendys 2021-06-23 03:24:18 Re: Unicode FFFF Special Codepoint should always collate high.