From: | Ron Chmara <ron(at)Opus1(dot)COM> |
---|---|
To: | The Hermit Hacker <scrappy(at)hub(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How to get around LIKE inefficiencies? |
Date: | 2000-11-06 03:19:21 |
Message-ID: | 3A062337.23BE3AE3@opus1.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The Hermit Hacker wrote:
> I'm tryin to figure out how to speed up udmsearch when run under
> postgresql, and am being hit by atrocious performance when using a LIKE
> query ... the query looks like:
> SELECT ndict.url_id,ndict.intag
> FROM ndict,url
> WHERE ndict.word_id=1971739852
> AND url.rec_id=ndict.url_id
> AND (url.url LIKE 'http://www.postgresql.org/%');
> Take off the AND ( LIKE ) part of the query, finishes almost as soon as
> you hit return. Put it back in, and you can go for coffee before it
> finishes ...
The entire *approach* is wrong. I'm currently in the process of optimizing
a db which is used for logfile mining, and it was originally built with the same
kludge.... it seems to make sense when there's only a few thousand records,
but at 20 million records, yikes!
The problem is that there's a "like" operation for something that is
fundamentally static (http://www.postgresql.org/) with some varying
data *after it*, that you're not using, in any form, for this operation.
This can be solved one of two ways:
1. Preprocess your files to strip out the paths and arguments on
a new field for the domain call. You are only setting up that data once,
so you shouldn't be using a "like" operator for every query. It's not
like on monday the server is "http://www.postgresql.org/1221" and on
tuesday the server is "http://www.postgresql.org/12111". It's always
the *same server*, so split out that data into it's own column, it's own
index.
This turns your query into:
SELECT ndict.url_id,ndict.intag
FROM ndict,url
WHERE ndict.word_id=1971739852
AND url.rec_id=ndict.url_id
AND url.server_url='http://www.postgresql.org/';
2. Trigger to do the above, if you're doing on-the-fly inserts into
your db (so you can't pre-process).
-Ronabop
--
Brought to you from iBop the iMac, a MacOS, Win95, Win98, LinuxPPC machine,
which is currently in MacOS land. Your bopping may vary.
From | Date | Subject | |
---|---|---|---|
Next Message | The Hermit Hacker | 2000-11-06 03:34:11 | Re: How to get around LIKE inefficiencies? |
Previous Message | Philip Warner | 2000-11-06 03:18:34 | Re: How to get around LIKE inefficiencies? |