Re: Indexing and regular expressions

From: Kjartan Бsюуrsson <a98kjaas(at)student(dot)his(dot)se>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Indexing and regular expressions
Date: 2002-04-07 11:49:08
Message-ID: 148154092062.20020407134908@student.his.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your reply.

No, I am not looking for a fuzzy match. I am simply wondering if there
are some methods available that can speed up joining of tables when
the join is done with a regular expression operator (one table
contains regular expression patterns, and the other strings that
should be matched against the pattern).

If no indexes are used, a full table scan of the string table is
needed if we want to select all strings that matches a given pattern.

My question is if there are any indexing methods available that can
be used to prone the search space in the string table when using
the regular expression operator.

I hope I made myself more clear now. Best regards,
Kjartan

Sunday, April 7, 2002, 12:54:27 PM, you wrote:

> On Sun, 7 Apr 2002, [ISO-8859-1] Kjartan аsЧСrsson wrote:

>> Is there any indexing technique available I can use when joining tables
>> with a regular expression pattern in pgsql?
>>
>> I know one method for indexing strings that will be matched with regular
>> expression patterns, and that is using so called k-gram indexes.
>> Indexing the string "kjartan" with k-gram index where k = 3 would
>> create "kja", "jar", "art", "rta", "tan" as an index. Ofcourse it is hard to

> Usually, k-grams technique is used to match patterns with errors and
> 3-grams produce "__k", "_kj", "kja", "jar", "art", "rta", "ta_", "a__"
> where leading and trailing spaces are used to compensate 'boundary' effect.

> But I dont' quite understand your question. Are you looking for fuzzy match ?
> If so, take a look on contrib modules.

>> decide the size of k and I'm sure in many cases mulitple k values might
>> be needed, depending on the situation.
>>
>> I have not done any major survey of available techniques, but I was
>> hoping I could get some pointers here.
>>
>> I assume pgsql always uses nested loop join when joining relations which are
>> joined with regular expression pattern?
>>

> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gavin Sherry 2002-04-07 12:27:08 Re: Suggestion for optimization
Previous Message Hiroshi Inoue 2002-04-07 11:44:55 Re: What's the CURRENT schema ?