From: | "Mason Hale" <masonhale(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | unexpected results with NOT IN query |
Date: | 2008-03-20 14:45:25 |
Message-ID: | 8bca3aa10803200745q147e06f6j39ef9c5ab4b21b52@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello --
I'm getting some unexpected results with a NOT IN query -- this is on 8.2.5.
This is the query in question:
prod_2=> select id from feed_download_task where id in (111102466,141701504)
and id not in (select last_feed_download_task_id from subscription);
id
----
(0 rows)
This query returns zero rows, but I expect it to return 1 row, because I
know that 111102466 *is not* in (select last_feed_download_task_id from
subscription) and I know that 1471701504 *is* in that set, as demonstrated
below:
Verify that both id values are in the feed_download_task table:
prod_2=> select id from feed_download_task where id in
(111102466,141701504);
id
-----------
141701504
111102466
(2 rows)
Verify that 111102466 is NOT in the set of last_feed_download_task_id's, and
that 141701504 is in this set:
prod_2=> select last_feed_download_task_id from subscription where
last_feed_download_task_id in (111102466,141701504);
last_feed_download_task_id
----------------------------
141701504
(1 row)
Here's the problem query again, with explain analyze.
prod_2=> select id from feed_download_task where id in (111102466,141701504)
and id not in (select last_feed_download_task_id from subscription);
id
----
(0 rows)
prod_2=> explain analyze select id from feed_download_task where id in
(111102466,141701504) and id not in (select last_feed_download_task_id from
subscription);
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on feed_download_task (cost=45077.24..45083.27 rows=1
width=4) (actual time=601.229..601.229 rows=0 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
Filter: (NOT (hashed subplan))
-> Bitmap Index Scan on feed_download_task_pkey
(cost=0.00..30.52rows=2 width=0) (actual time=
0.095..0.095 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
SubPlan
-> Seq Scan on subscription (cost=0.00..44097.78 rows=379578 width=4)
(actual time=0.032..488.193 rows=162365 loops=1)
Total runtime: 601.281 ms
(8 rows)
I've tried re-analyzing and re-indexing the tables involved in this query,
but I still left scratching my head.
I am also aware that I can use a left join instead of a NOT IN query -- but
in this case I need to use this in a DELETE statement, which eliminates the
possibility of the left join (I think).
Here is a version using a left outer join, it returns the expected result:
prod_2=# select feed_download_task.id from feed_download_task left join
subscription on (subscription.last_feed_download_task_id =
feed_download_task.id) where feed_download_task.id in (111102466,141701504)
and subscription.id IS NOT NULL;
id
-----------
141701504
(1 row)
Here is the explain analyze output for the above query:
prod_2=# explain analyze select feed_download_task.id from
feed_download_task left join subscription on (
subscription.last_feed_download_task_id = feed_download_task.id) where
feed_download_task.id in (111102466,141701504) and subscription.id IS NOT
NULL;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=31.19..51.69 rows=1 width=4) (actual
time=0.158..0.210rows=1 loops=1)
-> Bitmap Heap Scan on feed_download_task (cost=31.19..37.21 rows=2
width=4) (actual time=0.120..0.134 rows=2 loops=1)
Recheck Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Bitmap Index Scan on feed_download_task_pkey
(cost=0.00..31.19rows=2 width=0) (actual time=
0.102..0.102 rows=2 loops=1)
Index Cond: (id = ANY ('{111102466,141701504}'::integer[]))
-> Index Scan using index_subscription_on_last_feed_download_task_id on
subscription (cost=0.00..7.23 rows=1 width=4) (actual
time=0.036..0.037rows=0 loops=2)
Index Cond: (subscription.last_feed_download_task_id =
feed_download_task.id)
Filter: (id IS NOT NULL)
I feel like I must be missing something obvious.
Thanks in advance for the assistance.
cheers,
Mason
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-03-20 15:15:49 | Re: unexpected results with NOT IN query |
Previous Message | Adrian Klaver | 2008-03-20 14:43:39 | Re: Row size overhead |