From: | Edmund Bacon <ebacon(at)onesystem(dot)com> |
---|---|
To: | PostgreSQL SQL <pgsql-sql(at)postgresql(dot)org> |
Subject: | Re: Self-referencing table question |
Date: | 2005-03-24 16:13:38 |
Message-ID: | 4242E732.3050600@onesystem.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Sometimes using a temp table is a better idea:
e.g.
-- start by creating a temp table 'tids' that hold the to_ids that
-- we are interested in.
SELECT to_id
INTO TEMP TABLE tids
FROM correlation
WHERE from_id = 1234
ORDER BY val DESC limit 100;
-- The following temp table makes use of the primary key on
-- the correlation table, and the stated goal from the original
-- question that:
-- from_id > to_id
-- and from_id in (tids.to_id)
-- and to_id in (tids.to_id)
SELECT t1.to_id AS from_id, t2.to_id
INTO TEMP TABLE from_to
FROM tids t1, tids t2
WHERE t1.to_id > t2.to_id;
-- Now we can use the from_to table as an index into the correlation
-- table.
SELECT c.from_id, c.to_id, c.val
FROM from_to
JOIN correlation c USING(from_id, to_id)
WHERE val > 0.5;
The explain analyze for the final select works out to:
Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual
time=0.171..150.095 rows=2427 loops=1)
-> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8) (actual
time=0.006..7.660 rows=4950 loops=1)
-> Index Scan using correlation_pkey on correlation c
(cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0
loops=4950)
Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id =
c.to_id))
Filter: (val > 0.5::double precision)
Total runtime: 152.261 ms
Richard Huxton wrote:
> Sean Davis wrote:
>
>> I answer my own question, if only for my own records. The following
>> query is about 5-6 times faster than the original. Of course, if
>> anyone else has other ideas, I'd be happy to hear them.
>>
>> Sean
>>
>> explain analyze select from_id,to_id,val from exprsdb.correlation
>> where from_id in (select to_id from exprsdb.correlation where
>> from_id=2424 order by val desc limit 100) and to_id in (select to_id
>> from exprsdb.correlation where from_id=2424 order by val desc limit
>> 100) and val>0.6 and to_id<from_id;
>
>
> Might not be any faster, but you can do this as a self-join with
> subquery:
>
> SELECT c1.from_id, c1.to_id, c1.val
> FROM
> correlation c1,
> (
> SELECT to_id FROM correlation WHERE from_id=2424
> ORDER BY val DESC LIMIT 100
> ) AS c2
> (
> SELECT to_id FROM correlation WHERE from_id=2424
> ORDER BY val DESC LIMIT 100
> ) AS c3
> WHERE
> c1.from_id = c2.to_id
> AND c1.to_id = c3.to_id
> AND c1.val > 0.5
> AND c1.to_id < from_id
> ;
>
> I think PG should be smart enough nowadays to figure out these two
> queries are basically the same.
--
Edmund Bacon <ebacon(at)onesystem(dot)com>
From | Date | Subject | |
---|---|---|---|
Next Message | Moran.Michael | 2005-03-24 16:41:34 | Funtions + plpgsql + contrib/pgcrypto = ?? |
Previous Message | Kalyani Chennupati | 2005-03-24 16:03:48 | Is there a way to find a schema name |