From: | Stephane Bortzmeyer <bortzmeyer(at)nic(dot)fr> |
---|---|
To: | badlydrawnbhoy <badlydrawnbhoy(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: How to optimize query that concatenates strings? |
Date: | 2006-07-07 20:58:41 |
Message-ID: | 20060707205841.GA10545@sources.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Jul 07, 2006 at 04:29:51AM -0700,
badlydrawnbhoy <badlydrawnbhoy(at)gmail(dot)com> wrote
a message of 48 lines which said:
> I've got a database of URLs, and when inserting new data into it I
> want to make sure that there are no functionally equivalent URLs
> already present. For example, 'umist.ac.uk' is functionally the same
> as 'umist.ac.uk/'.
IMHO, your problem seems to be an instance of a very general class:
data that needs canonicalization. For instance, I work for a domain
name registry and domain names are case-insensitive. Do we store the
domain names blindly and then always use ILIKE or other
case-insensitive operators? No, we canonicalize domain names by
changing them to lower-case. That way:
* we do not have to think of using case-insensitive operators,
* indexes do work.
This is what I recommend here: decide on a canonical form and
canonicalize everything when it comes into the database (it is easy to
do it from a trigger).
If the original form is important, you can still store it in a column
intended for display, not for searching.
Here is a way to canonicalize, with a trigger. The function
"canonicalize" is left as an exercice (you can write it in PL/pgSQL,
C, Python, etc):
CREATE OR REPLACE FUNCTION force_canonicalization() RETURNS TRIGGER
AS 'BEGIN
NEW.url = canonicalize(NEW.url);
RETURN NEW;
END;'
LANGUAGE PLPGSQL;
CREATE TRIGGER force_canonicalization
BEFORE INSERT ON Things
FOR EACH ROW
EXECUTE PROCEDURE force_canonicalization();
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Loftis | 2006-07-07 21:52:01 | Bug? Changing where distinct occurs produces error? |
Previous Message | Adrian Klaver | 2006-07-07 20:42:21 | Re: How to insert .xls files into database |