Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martin L(dot) Buchanan" <martinlbuchanan(at)gmail(dot)com>, pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?
Date: 2023-02-09 02:06:34
Message-ID: CAApHDvruPCe3Sz5M=y=zzgT71Xj8chwD-MvXTih9b934V8Wf1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 9 Feb 2023 at 14:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > Tom's argument seems to think it's impossible, so if you find that
> > it's definitely not impossible, then you can assume he's wrong about
> > that.
>
> My point was that it seems like you'd need a separate BMH engine for
> each %-separated segment of the LIKE pattern. I'm not quite clear on
> whether BMH can handle '_' (single-char wildcard) conveniently by
> itself, although my gut feel is that you can probably make that part
> work. Maybe you can even extend the idea to embedded %, but that
> seems more difficult.

Yeah, I think to make it work with more complex patterns like
'%some%string%' or '%some_string%' you'd need to break that into
multiple independent searches for each portion between a wildcard
character. For the former pattern, you'd need to do some final check
that ensures that the 2nd pattern was found in some position >= the
position of the 1st pattern + its length. For the latter, the final
check would need to validate that the 2nd pattern was found at a
position of the first pattern + its length + 1. It's probably also
possible to make those patterns work when they don't contain the
leading and trailing % by checking that the first pattern is found at
position 0 and the end of the 2nd pattern is found at the end of the
search string.

However, I imagine going to the trouble of trying to make it work for
more complex patterns initially would be a bad idea. I imagine there
are just too many cases where we could demonstrate performance
regressions and that would cause us to reject the patch.

David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vladimir Sitnikov 2023-02-09 05:41:46 Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?
Previous Message Tom Lane 2023-02-09 01:49:43 Re: Boyer-Moore string searching in LIKE, ILIKE, and/or POSITION?