Re: Regex performance issue

From: "Alexandru Coseru" <alexandru(dot)coseru(at)totaltelecom(dot)ro>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Regex performance issue
Date: 2006-12-03 00:55:27
Message-ID: 024d01c71675$b8fa61f0$3cb16956@alex
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello..

I have never used tsearch2 , but at a first glance , i would not see any
major improvement , because the main advantage of tsearch is the splitting
in words of a phrase..
But here , i only have one word (no spaces).

Regards
Alex
----- Original Message -----
From: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
To: "Alexandru Coseru" <alexandru(dot)coseru(at)totaltelecom(dot)ro>
Cc: "Dave Dutcher" <dave(at)tridecap(dot)com>; <pgsql-performance(at)postgresql(dot)org>
Sent: Saturday, December 02, 2006 10:54 PM
Subject: Re: [PERFORM] Regex performance issue

>I may miss something but I'd use tsearch2. Check
> intdict dictionary for basic idea -
> http://www.sai.msu.su/~megera/wiki/Gendict
>
> Oleg
> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>
>> Hello...
>>
>> I cannot use LIKE , because the order of the match is reversed.
>> The prefix column is containing telephone destinations.
>> IE: ^001 - US , ^0039 Italy , etc..
>>
>> Here is a small sample:
>>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir | prefix | country | network | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>>
>>
>> Now , I have to match a dialednumber (let's say 00213618833) and find
>> it's destination...(It's algeria mobile).
>> I tried to make with a query of using LIKE , but i was not able to get
>> something..
>>
>>
>> Regards
>> Alex
>>
>>
>>
>>
>>
>> ----- Original Message ----- From: "Dave Dutcher" <dave(at)tridecap(dot)com>
>> To: "'Alexandru Coseru'" <alexandru(dot)coseru(at)totaltelecom(dot)ro>;
>> <pgsql-performance(at)postgresql(dot)org>
>> Sent: Saturday, December 02, 2006 10:36 PM
>> Subject: RE: [PERFORM] Regex performance issue
>>
>>
>>> -----Original Message-----
>>> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Alexandru
>>> Coseru
>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>> '0039051248787' ~
>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>
>>>
>>> QUERY PLAN
>>> --------------------------------------------------------------
>>> ----------------------------------------------------------------------
>>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
>>> time=857.715..857.716 rows=2 loops=1)
>>> Sort Key: length((prefix)::text)
>>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
>>> rows=31 width=67)
>>> (actual time=2.156..857.686 rows=2 loops=1)
>>> Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>> Filter: ('0039051248787'::text ~ (prefix)::text)
>>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>> Index Cond: ((id_ent = -2) AND (dir = 0))
>>> Total runtime: 857.804 ms
>>> (8 rows)
>>>
>>> "mmumu" btree (prefix varchar_pattern_ops)
>>>
>>
>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>> uses the varchar_pattern_ops operator class. It could be that the index
>> isn't selective enough, or is Postgres not able to use an index with
>> Posix
>> regular expressions? The docs seem to say that it can, but I'd be
>> curious
>> to see what happens if you use LIKE instead of ~.
>>
>> Dave
>>
>>
>>
>>
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2006-12-03 01:31:37 Re: Regex performance issue
Previous Message Alexandru Coseru 2006-12-03 00:53:53 Re: Regex performance issue