From: | Luca Ferrari <fluca1978(at)infinito(dot)it> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | help understanding explain output |
Date: | 2011-02-15 14:49:35 |
Message-ID: | 201102151549.35268.fluca1978@infinito.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I've got a doubt about partial indexes and the path chosen by the optimizer.
Consider this simple scenario:
CREATE TABLE p( pk serial NOT NULL , val2 text, val1 text, b boolean, PRIMARY
KEY (pk) );
INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1,1000000), 'val1b',
'val2b', true );
INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1000001,2000000),
'val1Notb', 'val2Notb', false );
CREATE INDEX i_p_b ON p (b) WHERE b = true;
ANALYZE p;
So I create a table with 2-million rows, the first million with b = true and
the second one with b = false.
Now doing an explain for a query that selects only on the b attribute I got:
EXPLAIN SELECT * FROM p WHERE b = false;
QUERY PLAN
------------------------------------------------------------
Seq Scan on p (cost=0.00..34706.00 rows=1000133 width=28)
Filter: (NOT b)
So a sequential scan. I know that the optimizer will not consider an index if
it is not filtering, but I don't understand exactly why in this case. In fact,
considering that the above query could remove the first half data pages (where
b = true), and considering that:
SELECT reltype, relval1, relpages, reltuples
FROM pg_class WHERE relval1 IN ('p', 'i_p_b');
reltype | relval1 | relpages | reltuples
---------+----------+----------+-----------
615079 | p | 14706 | 2e+06
0 | i_p_b | 2745 | 999867
The sequential access requires 14706 pages, while using the index for filtering
almost the half of those, we've got 2745 + 7353 = around 10000 pages.
I've tried to change the index type to an hash, but the situation did not
change. Even with enable_seqscan = off the above query is executed
sequentially, but with a different initial cost:
EXPLAIN SELECT * FROM p WHERE b = false;
QUERY PLAN
----------------------------------------------------------------------------
Seq Scan on p (cost=10000000000.00..10000034706.00 rows=1000133 width=28)
Filter: (NOT b)
And here comes the second doubt: since in both cases the planner is doing a
sequential access, why the first case has an initial cost = 0 and this one has
a cost of 1 million?
I'm getting lost here, I need some hint to understand what is happening.
I'm running
PostgreSQL 9.0.2 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-46), 64-bit
Thanks,
Luca
From | Date | Subject | |
---|---|---|---|
Next Message | Derrick Rice | 2011-02-15 15:00:23 | Re: Speeding up index scans by truncating timestamp? |
Previous Message | Emi Lu | 2011-02-15 14:35:45 | Re: read and restore deleted record |