Re: How to optimize query that concatenates strings?

From: Chander Ganesan <chander(at)otg-nc(dot)com>
To: badlydrawnbhoy <badlydrawnbhoy(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to optimize query that concatenates strings?
Date: 2006-07-07 16:37:34
Message-ID: 44AE8DCE.2070301@otg-nc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You could build a function-based index that contains the "simplified"
version of each URL (in your case, the field with the '/' stripped).
Then use the same function on the URL going in. In that case PostgreSQL
will use the index that you created already.

Take a look at the PostgreSQL documentation for function-based indexes.

select from ... where simplify(url) <> url_col;

In the example above 'url_col' would have a function-based index that
was based on 'simplify(url_col)'

Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC 27560
Phone: 877-258-8987/919-463-0999

badlydrawnbhoy wrote:
> Hi all,
>
> I've got a database of URLs, and when inserting new data into it I want
> to make sure that there are no functionally equivalent URLs already
> present. For example, 'umist.ac.uk' is functionally the same as
> 'umist.ac.uk/'.
>
> I find that searching for the latter form, using string concatentation
> to append the trailing slash, is much slower than searching for a
> simple string - the index on URL name isn't used to speed up the
> search.
>
> Here's an illustration
>
> url=# explain select exists(select * from url where url = 'umist.ac.uk'
> or url || '/' = 'umist.ac.uk') as present;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Result (cost=47664.01..47664.02 rows=1 width=0)
> InitPlan
> -> Seq Scan on url (cost=0.00..47664.01 rows=6532 width=38)
> Filter: ((url = 'umist.ac.uk'::text) OR ((url || '/'::text)
> = 'umist.ac.uk'::text))
> (4 rows)
>
> url=# explain select exists(select * from url where url =
> 'umist.ac.uk') as present;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Result (cost=5.97..5.98 rows=1 width=0)
> InitPlan
> -> Index Scan using url_idx on url (cost=0.00..5.97 rows=1
> width=38)
> Index Cond: (url = 'umist.ac.uk'::text)
> (4 rows)
>
>
> Is there any way I can force postgres to use the index when using the
> string concatenation in the query?
>
> Thanks in advance,
>
> BBB
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Parang Saraf 2006-07-07 16:40:41 How to insert .xls files into database
Previous Message Scott Marlowe 2006-07-07 16:30:37 Re: VACUUM and fsm_max_pages