Re: Regex performance issue

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Alexandru Coseru <alexandru(dot)coseru(at)totaltelecom(dot)ro>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Regex performance issue
Date: 2006-12-03 07:05:17
Message-ID: Pine.LNX.4.64.0612031003400.16338@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, 3 Dec 2006, Alexandru Coseru wrote:

> Hello..
>
> I have never used tsearch2 , but at a first glance , i would not see any
> major improvement , because the main advantage of tsearch is the splitting in
> words of a phrase..
> But here , i only have one word (no spaces).

Oh, yes, I was confused :) What if you consider you prefix as
1.2.3.4.5.6, then you could try our contrib/ltree module.

Oleg

>
>
> Regards
> Alex
> ----- Original Message ----- From: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
> To: "Alexandru Coseru" <alexandru(dot)coseru(at)totaltelecom(dot)ro>
> Cc: "Dave Dutcher" <dave(at)tridecap(dot)com>; <pgsql-performance(at)postgresql(dot)org>
> Sent: Saturday, December 02, 2006 10:54 PM
> Subject: Re: [PERFORM] Regex performance issue
>
>
>> I may miss something but I'd use tsearch2. Check
>> intdict dictionary for basic idea -
>> http://www.sai.msu.su/~megera/wiki/Gendict
>>
>> Oleg
>> On Sat, 2 Dec 2006, Alexandru Coseru wrote:
>>
>>> Hello...
>>>
>>> I cannot use LIKE , because the order of the match is reversed.
>>> The prefix column is containing telephone destinations.
>>> IE: ^001 - US , ^0039 Italy , etc..
>>>
>>> Here is a small sample:
>>>
>>> asterisk=> select * from destlist LIMIT 10;
>>> id | id_ent | dir | prefix | country | network | tip
>>> ----+--------+-----+------------+-------------+--------------------+-----
>>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>>>
>>>
>>> Now , I have to match a dialednumber (let's say 00213618833) and find
>>> it's destination...(It's algeria mobile).
>>> I tried to make with a query of using LIKE , but i was not able to get
>>> something..
>>>
>>>
>>> Regards
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>> ----- Original Message ----- From: "Dave Dutcher" <dave(at)tridecap(dot)com>
>>> To: "'Alexandru Coseru'" <alexandru(dot)coseru(at)totaltelecom(dot)ro>;
>>> <pgsql-performance(at)postgresql(dot)org>
>>> Sent: Saturday, December 02, 2006 10:36 PM
>>> Subject: RE: [PERFORM] Regex performance issue
>>>
>>>
>>>> -----Original Message-----
>>>> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Alexandru
>>>> Coseru
>>>> asterisk=> explain analyze SELECT * FROM destlist WHERE
>>>> '0039051248787' ~
>>>> prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC;
>>>>
>>>>
>>>> QUERY PLAN
>>>> --------------------------------------------------------------
>>>> ----------------------------------------------------------------------
>>>> Sort (cost=7925.07..7925.15 rows=31 width=67) (actual
>>>> time=857.715..857.716 rows=2 loops=1)
>>>> Sort Key: length((prefix)::text)
>>>> -> Bitmap Heap Scan on destlist (cost=60.16..7924.30
>>>> rows=31 width=67)
>>>> (actual time=2.156..857.686 rows=2 loops=1)
>>>> Recheck Cond: ((id_ent = -2) AND (dir = 0))
>>>> Filter: ('0039051248787'::text ~ (prefix)::text)
>>>> -> Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16
>>>> rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1)
>>>> Index Cond: ((id_ent = -2) AND (dir = 0))
>>>> Total runtime: 857.804 ms
>>>> (8 rows)
>>>>
>>>> "mmumu" btree (prefix varchar_pattern_ops)
>>>>
>>>
>>> I'm surpised Postgres isn't using the index on prefix seeing as the index
>>> uses the varchar_pattern_ops operator class. It could be that the index
>>> isn't selective enough, or is Postgres not able to use an index with Posix
>>> regular expressions? The docs seem to say that it can, but I'd be curious
>>> to see what happens if you use LIKE instead of ~.
>>>
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>>
>>
>> --
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joost Kraaijeveld 2006-12-03 08:59:25 Which query analiser tools are available?
Previous Message Tom Lane 2006-12-03 04:05:12 Re: Regex performance issue