Quick Links

Re: Support LIKE with nondeterministic collations

From:	Peter Eisentraut <peter(at)eisentraut(dot)org>
To:	Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support LIKE with nondeterministic collations
Date:	2024-05-03 18:53:52
Message-ID:	b32cefe2-b9e2-499e-b919-fe8f21c5bc22@eisentraut.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03.05.24 16:58, Daniel Verite wrote:
> * Generating bounds for a sort key (prefix matching)
>
> Having sort keys for strings allows for easy creation of bounds -
> sort keys that are guaranteed to be smaller or larger than any sort
> key from a give range. For example, if bounds are produced for a
> sortkey of string “smith”, strings between upper and lower bounds
> with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
> of upper bounds can be generated - the first one will match only
> strings of equal length, while the second one will match all the
> strings with the same initial prefix.
>
> CLDR 1.9/ICU 4.6 and later map U+FFFF to a collation element with
> the maximum primary weight, so that for example the string
> “smith\uFFFF” can be used as the upper bound rather than modifying
> the sort key for “smith”.
>
> In other words it says that
>
> col LIKE 'smith%' collate "nd"
>
> is equivalent to:
>
> col >= 'smith' collate "nd" AND col < U&'smith\ffff' collate "nd"
>
> which could be obtained from an index scan, assuming a btree
> index on "col" collate "nd".
>
> U+FFFF is a valid code point but a "non-character" [1] so it's
> not supposed to be present in normal strings.

Thanks, this could be very useful!

In response to

Re: Support LIKE with nondeterministic collations at 2024-05-03 14:58:55 from Daniel Verite

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Cary Huang	2024-05-03 18:55:01	Re: Support tid range scan in parallel?
Previous Message	Peter Eisentraut	2024-05-03 18:44:42	Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation