From: | "Claus Guttesen" <kometen(at)gmail(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | select from two tables |
Date: | 2008-09-15 10:04:55 |
Message-ID: | b41c75520809150304u628ae77eua3af6a54d5e1e59c@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi.
I have two tables, images and duplicates. The images-table is our
current table and has approx. 90 mill. entries. I want to weed out
duplicate file-entries (based on the md5-checksum of the file and
user-id) and update the file name with the first entry found, if any.
The images-table is:
id serial primary key,
userid int,
filename text,
hashcode text,
and some additional fields like upload-time, exif-date etc.
Duplicates:
id serial primary key,
userid int,
filename text,
hashcode text,
ref_count int
Here is some pseudo-code (in rails) that I have tested. This is
somewhat slow and I want to speed it up:
a=0
while a < 10000
@image = select * from images where id = a;
if @image
@duplicate = select * from duplicates where userid = @image.userid
and hashcode = @image.hashcode
if @duplicates
update @duplicates set ref_count = @duplicates.ref_count + 1
else
insert into duplicates (foo) values (bar)
end
end
a++
end
What I'd like to do is to perform a single query where I select from
both tables and then test whether the file is all-ready in duplicates:
@rec = select * from images i and duplicates d where i.id = a and
d.userid = i.userid and d.hashcode = i.hashcode
if @rec.images and @rec.duplicates
update duplicates.ref_count
else
insert into duplicates (foo) values (bar)
end
--
regards
Claus
When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.
Shakespeare
From | Date | Subject | |
---|---|---|---|
Next Message | Andreas Kretschmer | 2008-09-15 12:01:26 | Re: select from two tables |
Previous Message | Karl Grossner | 2008-09-14 23:59:44 | a simple transform |