Re: Indexing and regular expressions

From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: Kjartan <a98kjaas(at)student(dot)his(dot)se>
Cc: oleg(at)sai(dot)msu(dot)su, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Indexing and regular expressions
Date: 2002-04-07 19:08:44
Message-ID: 20020407150844.2b0df1ff.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 7 Apr 2002 13:49:08 +0200
"Kjartan " <a98kjaas(at)student(dot)his(dot)se> wrote:
> Thank you for your reply.
>
> No, I am not looking for a fuzzy match. I am simply wondering if there
> are some methods available that can speed up joining of tables when
> the join is done with a regular expression operator (one table
> contains regular expression patterns, and the other strings that
> should be matched against the pattern).
>
> If no indexes are used, a full table scan of the string table is
> needed if we want to select all strings that matches a given pattern.

If the pattern is arbitrary, I'm not sure that any indexing technique
will be able to significantly improve performance (or at least, I'd
be *very* interested to see such a technique).

If you're storing the regular expressions in another table, you
could also store the pre-compiled patterns and use those for a
seqscan, but that won't net you a large improvement, particularly
if your set of patterns is much smaller than your set of strings
to match against (in which case the time to compile the pattern
becomes insignificant).

If your regular expressions contain common sub-elements (e.g.
many of them include "match string beginning with xxx" or whatever),
you could perhaps use indexes to optimize those sub-elements,
and then run the rest of the pattern on the tuples found by the
index. But if your patterns are truly arbitrary, this will be
unlikely to help.

Therefore, the answer is no AFAICT -- regular expressions are too
flexible to allow for optimization in advance.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2002-04-07 23:18:57 unknownin/out patch (was [HACKERS] PQescapeBytea is not multibyte aware)
Previous Message Tom Lane 2002-04-07 17:10:53 Re: PQescapeBytea is not multibyte aware