Re: Question about POSIX Regular Expressions performance on large dataset.

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question about POSIX Regular Expressions performance on large dataset.
Date: 2010-08-18 02:28:09
Message-ID: AANLkTimaiPKsy8xgyseSxD1qn-en7ciDP=U6kniNeFwH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, Aug 17, 2010 at 8:21 PM, Jose Ildefonso Camargo Tolosa
<ildefonso(dot)camargo(at)gmail(dot)com> wrote:
> Hi!
>
> I'm analyzing the possibility of using PostgreSQL to store a huge
> amount of data (around 1000M records, or so....), and these, even
> though are short (each record just have a timestamp, and a string that
> is less than 128 characters in length), the strings will be matched
> against POSIX Regular Expressions (different regexps, and maybe
> complex).
>
> Because I don't have a system large enough to test this here, I have
> to ask you (I may borrow a medium-size server, but it would take a
> week or more, so I decided to ask here first).  How is the performance
> of Regexp matching in PostgreSQL?  Can it use indexes? My guess is:
> no, because I don't see a way of generally indexing to match regexp :(
> , so, tablescans for this huge dataset.....
>
> What do you think of this?

Yes it can index such things, but it has to index them in a fixed way.
i.e. you can create functional indexes with pre-built regexes. But
for ones where the values change each time, you're correct, no indexes
will be used.

Could full text searching be used instead?

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Jose Ildefonso Camargo Tolosa 2010-08-18 02:30:01 Re: Question about POSIX Regular Expressions performance on large dataset.
Previous Message Jose Ildefonso Camargo Tolosa 2010-08-18 02:21:25 Question about POSIX Regular Expressions performance on large dataset.