Wrong results with equality search using trigram index and non-deterministic collation

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Wrong results with equality search using trigram index and non-deterministic collation
Date: 2024-09-17 06:00:18
Message-ID: 8ef4899c4acfebca45cc6c042a6dc611d25ffab1.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Using a trigram index with an non-deterministic collation can
lead to wrong query results:

CREATE COLLATION faux_cn (PROVIDER = icu, LOCALE = 'und', DETERMINISTIC = FALSE, RULES = '&l = r');

CREATE TABLE boom (id integer PRIMARY KEY, t text COLLATE faux_cn);

INSERT INTO boom VALUES (1, 'right'), (2, 'light');

SELECT * FROM boom WHERE t = 'right';

id │ t
════╪═══════
1 │ right
2 │ light
(2 rows)

CREATE INDEX ON boom USING gin (t gin_trgm_ops);

SET enable_seqscan = off;

SELECT * FROM boom WHERE t = 'right';

id │ t
════╪═══════
1 │ right
(1 row)

I also see questionable results with the similarity operator (with and
without the index):

SELECT * FROM boom WHERE t % 'rigor';

id │ t
════╪═══════
1 │ right
(1 row)

But here you could argue that the operator ignores the collation, so
the result is correct. With equality, there is no such loophole.

I don't know what the correct fix would be. Perhaps just refusing to use
the index for equality comparisons with non-deterministic collations.

Yours,
Laurenz Albe

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-09-17 06:06:26 Re: Add memory/disk usage for WindowAgg nodes in EXPLAIN
Previous Message Alexander Lakhin 2024-09-17 06:00:00 Re: [HACKERS] make async slave to wait for lsn to be replayed