Re: Support LIKE with nondeterministic collations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support LIKE with nondeterministic collations
Date: 2024-05-03 00:11:48
Message-ID: CA+TgmoYntBJZnnZrRBXMgbXEXrp0Bm8yje4VAjE5LX6aXwj9_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
> On 30.04.24 14:39, Daniel Verite wrote:
> > postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > The first two results look fine, but the next one is inconsistent.
>
> This is correct, because '_' means "any single character". This is
> independent of the collation.

Seems really counterintuitive. I had to think for a long time to be
able to guess what was happening here. Finally I came up with this
guess:

If the collation-aware matching tries to match up f with the initial
period, the period is skipped and the f matches f. But when the
wildcard is matched to the initial period, that uses up the wildcard
and then we're left trying to match o with f, which doesn't work.

Is that right?

It'd probably be good to use something like this as an example in the
documentation. My intuition is that if foo matches a string, then _oo
f_o and fo_ should also match that string. Apparently that's not the
case, but I doubt I'll be the last one who thinks it should be.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-05-03 00:18:59 Re: Weird "null" errors during DROP TYPE (pg_upgrade)
Previous Message Devrim Gündüz 2024-05-02 23:44:10 Re: Proposal: Early providing of PGDG repositories for the major Linux distributions like Fedora or Debian