Estimates on partial index

From: Victor Yegorov <vyegorov(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Estimates on partial index
Date: 2016-08-18 13:52:11
Message-ID: CAGnEbohwiUPY1-x6=u4bJuAvnhr1oNYP_oTpWOPVcdCsyisjyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings.

I have a question on why planner chooses `IndexScan` for the following
query:

SELECT la.loan_id, la.due_date, la.is_current
FROM loan_agreements la WHERE la.is_current AND '2016-08-11' >
la.due_date;

Relevant (cannot post it all, sorry) table definition is:

Column Type Modifiers
------------------------------ --------------------------- ---------
id bigint not null
...
is_current boolean not null
due_date date not null
loan_id bigint

Indexes:
"loan_agreements_pkey" PRIMARY KEY, btree (id)
...
"idx_loan_agreements_due_date" btree (due_date)
"idx_loan_agreemnets_loan_id_cond_is_current_true" btree (loan_id)
WHERE is_current = true

Some stats:
SELECT relname,reltuples::numeric,relpages FROM pg_class WHERE oid IN
('loan_agreements'::regclass,
'idx_loan_agreemnets_loan_id_cond_is_current_true'::regclass,
'idx_loan_agreements_due_date'::regclass);
relname reltuples relpages
------------------------------------------------ --------- --------
idx_loan_agreements_due_date 664707 1828
idx_loan_agreemnets_loan_id_cond_is_current_true 237910 655
loan_agreements 664707 18117

Settings:

SELECT name,setting,unit FROM pg_settings WHERE name ~
'(buffers|mem|cost)$';
name setting unit
-------------------- -------- ----
autovacuum_work_mem 524288 kB
cpu_index_tuple_cost 0.005 ¤
cpu_operator_cost 0.0025 ¤
cpu_tuple_cost 0.01 ¤
maintenance_work_mem 16777216 kB
random_page_cost 2.5 ¤
seq_page_cost 1 ¤
shared_buffers 1572864 8kB
temp_buffers 8192 8kB
wal_buffers 2048 8kB
work_mem 65536 kB
PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit

Planner chooses the following plan:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_loan_agreemnets_loan_id_cond_is_current_true on
loan_agreements la (cost=0.42..16986.53 rows=226145 width=13) (actual
time=0.054..462.394 rows=216530 loops=1)
Filter: ('2016-08-11'::date > due_date)
Rows Removed by Filter: 21304
Buffers: shared hit=208343 read=18399
Planning time: 0.168 ms
Execution time: 479.773 ms

If I disable IndexScans, plan changes likes this:

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on loan_agreements la (cost=2884.01..23974.88
rows=226145 width=13) (actual time=38.893..200.376 rows=216530 loops=1)
Recheck Cond: is_current
Filter: ('2016-08-11'::date > due_date)
Rows Removed by Filter: 21304
Heap Blocks: exact=18117
Buffers: shared hit=18212 read=557
-> Bitmap Index Scan on
idx_loan_agreemnets_loan_id_cond_is_current_true (cost=0.00..2827.47
rows=237910 width=0) (actual time=35.166..35.166 rows=237853 loops=1)
Buffers: shared hit=119 read=533
Planning time: 0.171 ms
Execution time: 214.341 ms

Question is — why IndexScan over partial index is estimated less than
BitmapHeap + BitmapIndex scan. And how can I tell Planner, that IndexScan
over 1/3 of table is not a good thing — IndexScan is touching 10x more
pages and in a typical situation those are cold.

Thanks in advance.

--
Victor Y. Yegorov

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2016-08-18 13:56:10 Re: Estimates on partial index
Previous Message pinker 2016-08-17 11:45:03 Big data INSERT optimization - ExclusiveLock on extension of the table