| From: | Guillaume Lelarge <guillaume(at)lelarge(dot)info> | 
|---|---|
| To: | Luca Ferrari <fluca1978(at)infinito(dot)it> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: help understanding explain output | 
| Date: | 2011-02-15 23:05:16 | 
| Message-ID: | 4D5B06AC.2020700@lelarge.info | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Le 15/02/2011 15:49, Luca Ferrari a écrit :
> Hello,
> I've got a doubt about partial indexes and the path chosen by the optimizer.
> Consider this simple scenario:
> 
> CREATE TABLE p( pk serial NOT NULL , val2 text, val1 text, b boolean, PRIMARY 
> KEY (pk) );
> INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1,1000000), 'val1b', 
> 'val2b', true );
> INSERT INTO p(pk, val1, val2, b) VALUES( generate_series(1000001,2000000), 
> 'val1Notb', 'val2Notb', false );
> CREATE INDEX i_p_b ON p (b) WHERE b = true;
> ANALYZE p;
> 
> So I create a table with 2-million rows, the first million with b = true and 
> the second one with b = false.
> Now doing an explain for a query that selects only on the b attribute I got:
> 
> EXPLAIN SELECT * FROM p WHERE b = false;
>                          QUERY PLAN                         
> ------------------------------------------------------------
>  Seq Scan on p  (cost=0.00..34706.00 rows=1000133 width=28)
>    Filter: (NOT b)
> 
> 
> So a sequential scan. I know that the optimizer will not consider an index if 
> it is not filtering, but I don't understand exactly why in this case. In fact, 
> considering that the above query could remove the first half data pages (where 
> b = true), and considering that:
> 
> SELECT reltype, relval1, relpages, reltuples
> FROM pg_class WHERE relval1 IN ('p', 'i_p_b');
>  reltype | relval1  | relpages | reltuples 
> ---------+----------+----------+-----------
>   615079 | p        |    14706 |     2e+06
>        0 | i_p_b |     2745 |    999867
> 
> The sequential access requires 14706 pages, while using the index for filtering 
> almost the half of those, we've got 2745 + 7353 = around 10000 pages.
Accessing a page in an index is way costier then accessing a page in an
table with a sequential scan. By default, random_page_cost is 4 times
seq_page_cost. So it's not really surprising that when you want to get
half the table, PostgreSQL won't use the index. You would need to have a
really selective query to make an index scan interesting to use.
> I've tried to change the index type to an hash, but the situation did not 
> change. Even with enable_seqscan = off the above query is executed 
> sequentially, but with a different initial cost:
> 
> 
> EXPLAIN SELECT * FROM p WHERE b = false;
>                                  QUERY PLAN                                 
> ----------------------------------------------------------------------------
>  Seq Scan on p  (cost=10000000000.00..10000034706.00 rows=1000133 width=28)
>    Filter: (NOT b)
> 
> And here comes the second doubt: since in both cases the planner is doing a 
> sequential access, why the first case has an initial cost = 0 and this one has 
> a cost of 1 million?
When you disable enable_seqscan, you actually say to the planner to add
a really big number to the estimated cost of a seq scan. Which usually
disables the use of the seqscan. Not in your case.
-- 
Guillaume
 http://www.postgresql.fr
 http://dalibo.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chris | 2011-02-15 23:08:52 | Re: help understanding explain output | 
| Previous Message | John R Pierce | 2011-02-15 22:46:11 | Re: subset of attributes |