From: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Strange choice of general index over partial index |
Date: | 2015-01-16 02:32:00 |
Message-ID: | 54B87820.3090101@catalyst.net.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 16/01/15 13:37, Mark Kirkwood wrote:
> On 16/01/15 11:30, Josh Berkus wrote:
>> This is an obfuscation and mock up, but:
>>
>> table files (
>> id serial pk,
>> filename text not null,
>> state varchar(20) not null
>> ... 18 more columns
>> )
>>
>> index file_state on (state)
>> (35GB in size)
>> index file_in_flight_state (state) where state in (
>> 'waiting','assigning', 'processing' )
>> (600MB in size)
>> ... 10 more indexes
>>
>> More important facts:
>> * state = 'done' 95% of the time. thereform the partial index
>> represents only 5% of the table
>> * all indexes and the table are very bloated
>> * server has 128GB RAM
>> * Version 9.2.
>>
>> Given this setup, I would expect the planner to *always* choose
>> file_in_flight_state over file_state for this query:
>>
>> SELECT id, filename FROM files WHERE state = 'waiting';
>>
>> ... and yet it keeps selecting file_state based on extremely small
>> changes to the stats. This is important because the same query, using
>> file_state, is 20X to 50X slower, because that index frequently gets
>> pushed out of memory.
>>
>> What am I missing? Or is this potentially a planner bug for costing?
>>
>
> Are you seeing a bitmapscan access plan? If so see if disabling it gets
> you a plan on the files_in_flight index. I'm seeing this scenario with a
> fake/generated dataset a bit like yours in 9.2 (9.5 uses the
> files_in_flight w/o any coercing).
>
FWIW: For me 9.2 and 9.3 (default config) generate plans like:
state=# EXPLAIN ANALYZE
SELECT id, filename
FROM files
WHERE state = 'processing';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on files (cost=3102.02..89228.68 rows=164333
width=15) (actual time=26.629..803.507 rows=166696 loops=1)
Recheck Cond: ((state)::text = 'processing'::text)
Rows Removed by Index Recheck: 7714304
-> Bitmap Index Scan on file_state (cost=0.00..3060.93 rows=164333
width=0) (actual time=25.682..25.682 rows=166696 loops=1)
Index Cond: ((state)::text = 'processing'::text)
Total runtime: 808.662 ms
(6 rows)
whereas 9.4 and 9.5 get:
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using file_in_flight on files (cost=0.42..62857.39
rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1)
Index Cond: ((state)::text = 'processing'::text)
Planning time: 24.203 ms
Execution time: 208.926 ms
(4 rows)
This is with each version loading exactly the same dataset (generated by
the attached scripty). Obviously this is a vast simplification of what
Josh is looking at - but it is (hopefully) interesting that these later
versions are doing so much better...
Cheers
Mark
Attachment | Content-Type | Size |
---|---|---|
gendata.pl | application/x-perl | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-01-16 03:03:53 | Re: Strange choice of general index over partial index |
Previous Message | Tom Lane | 2015-01-16 01:48:24 | Re: Strange choice of general index over partial index |