From: | "Jim Carroll" <jim(at)carroll(dot)com> |
---|---|
To: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Postgres inconsistent use of Index vs. Seq Scan |
Date: | 2015-03-13 21:27:29 |
Message-ID: | 011801d05dd4$8226b890$867429b0$@carroll.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I'm having difficulty understanding what I perceive as an inconsistency in
how the postgres parser chooses to use indices. We have a query based on NOT
IN against an indexed column that the parser executes sequentially, but
when we perform the same query as IN, it uses the index.
I've created a simplistic example that I believe demonstrates the issue,
notice this first query is sequential
CREATE TABLE node
(
id SERIAL PRIMARY KEY,
vid INTEGER
);
CREATE INDEX x ON node(vid);
INSERT INTO node(vid) VALUES (1),(2);
EXPLAIN ANALYZE
SELECT *
FROM node
WHERE NOT vid IN (1);
Seq Scan on node (cost=0.00..36.75 rows=2129 width=8) (actual
time=0.009..0.010 rows=1 loops=1)
Filter: (vid <> 1)
Rows Removed by Filter: 1
Total runtime: 0.025 ms
But if we invert the query to IN, you'll notice that it now decided to use
the index
EXPLAIN ANALYZE
SELECT *
FROM node
WHERE vid IN (2);
Bitmap Heap Scan on node (cost=4.34..15.01 rows=11 width=8) (actual
time=0.017..0.017 rows=1 loops=1)
Recheck Cond: (vid = 1)
-> Bitmap Index Scan on x (cost=0.00..4.33 rows=11 width=0) (actual
time=0.012..0.012 rows=1 loops=1)
Index Cond: (vid = 1)
Total runtime: 0.039 ms
Can anyone shed any light on this? Specifically, is there a way to re-write
out NOT IN to work with the index (when obviously the result set is not as
simplistic as just 1 or 2).
We are using Postgres 9.2 on CentOS 6.6
From | Date | Subject | |
---|---|---|---|
Next Message | Varadharajan Mukundan | 2015-03-13 22:03:05 | Re: Performance issues |
Previous Message | Tomas Vondra | 2015-03-13 21:20:20 | Re: Performance issues |