Selecting records not present in related tables

From: Hector Rosas <jeziel(dot)rosas(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Selecting records not present in related tables
Date: 2005-10-06 19:43:33
Message-ID: 25266c5c0510061243g13c3f49cudde160c67f961bca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello, I'm trying to select records in a table not present in a related
table, in example, I've a table with message information (subject, message,
date, etc) and another (usermessages) with where user(s) has that message,
its state, etc. Records in this table will be deleted in a certain time
(just some extra info).
I want to select messages records that aren't present in the other table
(usermessages), I got the next two queries, maybe someone can suggest a
better one.

SELECT m.id <http://m.id> FROM messages AS m
WHERE (SELECT count(um.*) FROM usermessages AS um WHERE um.idmessage=m.id
)=0;

SELECT m.id <http://m.id> FROM messages AS m where id NOT IN (select
um.idmessage FROM usermessages um);

Both queries work, but doing a EXPLAIN ANALYZE I got the next results.

bd=# explain analyze SELECT m.id <http://m.id> FROM messages AS m
bd-# WHERE (SELECT count(um.*) FROM usermessages AS um WHERE
um.idmessage=m.id)=0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on messages m (cost=0.00..3915.75 rows=3 width=4) (actual time=
40.531..40.531 rows=0 loops=1)
Filter: ((subplan) = 0)
SubPlan
-> Aggregate (cost=9.11..9.11 rows=1 width=4) (actual
time=0.098..0.104rows=1 loops=355)
-> Index Scan using message_selection on usermessages um
(cost=0.00..9.10rows=3 width=4) (actual time=
0.067..0.078 rows=1 loops=355)
Index Cond: (idmessage = $0)
Total runtime: 40.605 ms
(7 rows)

bd=# explain analyze select m.id <http://m.id> FROM messages AS m where id
NOT IN (select um.idmessage FROM usermessages um);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on messages m (cost=9.68..43.00 rows=213 width=4) (actual time=
20.329..20.329 rows=0 loops=1)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on usermessages um (cost=0.00..8.54 rows=454 width=4) (actual
time=0.008..13.094 rows=454 loops=1)
Total runtime: 20.386 ms
(5 rows)

In first query, cost can be between 0 and almost 4 sec, and also I see that
loops value, that I don't know what performance issues could arise.
In second query, I see a seq scan, which I don't like, I think that with too
many records this query could take ages, or maybe not, but loops value is 1.

I hope someone can give some advice with those queries , or maybe a better
query. I've not decided which query I'm going to use, thanks!

Jeziel.

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Anthony Molinaro 2005-10-06 20:11:44 Re: Selecting records not present in related tables
Previous Message Michael Fuhr 2005-10-06 18:37:45 Re: DATESTYLE and 0000-00-00