Re: Regex performance issue

From: "Alexandru Coseru" <alexandru(dot)coseru(at)totaltelecom(dot)ro>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Regex performance issue
Date: 2006-12-03 00:53:53
Message-ID: 024601c71675$81277560$3cb16956@alex
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello..

Thanks for the tip , i think i have got the ideea..

I'm too tired too , and i will try it tommorow.

Anyway , anybody has a clue why this regex is that CPU intensive ? I did
not saw the light on my drives blinking , and also vmstat doesn't yeld any
blocks in or out...
And how can it be optimized ?

Is there a way to trace the system calls ?
strace doesn't give me anything else but some lseeks and reads...

PS: Tried it with a 8.2 snaphsot and the result is the same..

Regards
Alex
----- Original Message -----
From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Alexandru Coseru" <alexandru(dot)coseru(at)totaltelecom(dot)ro>
Cc: <pgsql-performance(at)postgresql(dot)org>
Sent: Sunday, December 03, 2006 12:35 AM
Subject: Re: [PERFORM] Regex performance issue

> Alexandru Coseru wrote:
>> Hello..
>>
>> I cannot use the first advice , because i'm not aware of the prefix
>> length in the database...
>> This is why i'm ordering after length(prefix)..
>>
>>
>> On the 2nd one , i'm not sure that i can follow you..
>
> Ok, let me try again :)
>
>> asterisk=> select * from destlist LIMIT 10;
>> id | id_ent | dir | prefix | country | network | tip
>> ----+--------+-----+------------+-------------+--------------------+-----
>> 1 | -1 | 0 | (^0093) | AFGHANISTAN | AFGHANISTAN | 6
>> 2 | -1 | 0 | (^00937) | AFGHANISTAN | AFGHANISTAN Mobile | 5
>> 3 | -1 | 0 | (^00355) | ALBANIA | ALBANIA | 6
>> 4 | -1 | 0 | (^0035538) | ALBANIA | ALBANIA Mobile | 5
>> 5 | -1 | 0 | (^0035568) | ALBANIA | ALBANIA Mobile | 5
>> 6 | -1 | 0 | (^0035569) | ALBANIA | ALBANIA Mobile | 5
>> 7 | -1 | 0 | (^00213) | ALGERIA | ALGERIA | 6
>> 8 | -1 | 0 | (^0021361) | ALGERIA | ALGERIA Mobile | 5
>> 9 | -1 | 0 | (^0021362) | ALGERIA | ALGERIA Mobile | 5
>> 10 | -1 | 0 | (^0021363) | ALGERIA | ALGERIA Mobile | 5
>
> Store the prefix in a character column, without the regex stuff. Like
> below. I've removed the columns that are not relevant, in fact it would
> make sense to store them in another table, and have just the id and prefix
> in this table.
>
> id | prefix | network
> ---+---------+--------------------
> 1 | 0093 | AFGHANISTAN
> 2 | 00937 | AFGHANISTAN Mobile
> 3 | 00355 | ALBANIA
> 4 | 0035538 | ALBANIA Mobile
> 5 | 0035568 | ALBANIA Mobile
> 6 | 0035569 | ALBANIA Mobile
> 7 | 00213 | ALGERIA
> 8 | 0021361 | ALGERIA Mobile
> 9 | 0021362 | ALGERIA Mobile
> 10 | 0021363 | ALGERIA Mobile
>
> Now, add the rows marked with start below:
>
> id | prefix | network
> ----+---------+--------------------
> 1 | 0093 | AFGHANISTAN
> 2 | 00937 | AFGHANISTAN Mobile
> * 1 | 00938 | AFGHANISTAN
> 3 | 00355 | ALBANIA
> 4 | 0035538 | ALBANIA Mobile
> * 3 | 0035539 | ALBANIA
> 5 | 0035568 | ALBANIA Mobile
> 6 | 0035569 | ALBANIA Mobile
> * 3 | 003557 | ALBANIA
> 7 | 00213 | ALGERIA
> 8 | 0021361 | ALGERIA Mobile
> 9 | 0021362 | ALGERIA Mobile
> 10 | 0021363 | ALGERIA Mobile
> * 7 | 0021364 | ALGERIA
>
> The added rows make it unnecessary to use regex for the searches. You can
> use just the >= operator like this: (using the example number you gave)
>
> SELECT id FROM destlist WHERE '00213618833' >= prefix ORDER BY prefix
> LIMIT 1
>
> Which would return id 7. As another example, a query of "00213654321"
> would match the last row, which again has id 7.
>
> I'm too tired to figure out the exact algorithm for adding the rows, but
> I'm pretty sure it can be automated... The basic idea is that when there's
> a row with id A and prefix XXXX, and another row with id B and prefix
> XXXXY, we add another row with id A and prefix XXXX(Y+1).
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alexandru Coseru 2006-12-03 00:55:27 Re: Regex performance issue
Previous Message Heikki Linnakangas 2006-12-02 22:35:18 Re: Regex performance issue