Re: Query not using index pgsql 8.2.3

From: Henrik Zagerholm <henke(at)mac(dot)se>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Query not using index pgsql 8.2.3
Date: 2007-03-23 13:04:19
Message-ID: CF8A4C11-DB11-450E-AA64-0FC6537AFC7D@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


23 mar 2007 kl. 13:34 skrev Michael Fuhr:

> On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote:
>> 23 mar 2007 kl. 12:33 skrev Michael Fuhr:
>>> The row count estimate for fk_filetype_id = 83 is high by an order
>>> of magnitude:
>>>
>>>> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251
>>>> width=0) (actual time=21.958..21.958 rows=112 loops=1)
>>>>
>>>> Index Cond: (fk_filetype_id = 83)
>>>
>>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently?
>>> If so then you might try increasing the statistics target for
>>> tbl_file.fk_filetype_id and perhaps some of the columns in the join
>>> conditions.
>>
>> I did a vacuum full and reindex on all tables.
>
> VACUUM FULL is seldom (if ever) necessary if you're running plain
> VACUUM (without FULL) often enough, either manually or via autovacuum.
>
>> Now I also did a vacuum analyze on tbl_acl (the biggest table with
>> about 4.5 millin rows)
>>
>> Same result.
>
> I'd suggest analyzing all tables. The bad estimate I mentioned
> appears to be for a column in tbl_file so if you didn't analyze
> that table then the query plan probably won't improve.
>
>> But do you mean if the row_count estimate is big it can't use any
>> index on any other table within the JOINs?
>
> High row count estimates make the planner think that scanning entire
> tables would be faster than using indexes. The more of a table a
> query must fetch the less efficient an index scan becomes, to the
> point that a sequential scan is faster than an index scan.
>
>> Any specific parameters I should adjust?
>
> If analyzing the tables doesn't improve the row count estimates
> then try increasing some columns' statistics targets and re-analyze
> the table or just that column. Example:
>
> ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100;
> ANALYZE tbl_file (fk_filetype_id);
>
I analyzed all tables and found no difference but after changing the
statistics for fk_filetype_id I got way better results.
Still there is a big cost on table tbl_file_structure_id it is making
a Bitmap heap scan.
Could I do something to improve this?

Thanks for all your help so far!

Limit (cost=115168.22..115169.12 rows=20 width=173) (actual
time=5138.401..5138.982 rows=20 loops=1)
-> GroupAggregate (cost=115168.22..115341.60 rows=3853
width=173) (actual time=5138.394..5138.929 rows=20 loops=1)
-> Sort (cost=115168.22..115177.85 rows=3853 width=173)
(actual time=5138.339..5138.430 rows=73 loops=1)
Sort Key: tbl_file.file_name, tbl_file.file_ctime,
tbl_structure.structure_path, tbl_computer.pk_computer_id,
tbl_filetype.filetype_icon, tbl_computer.computer_name,
tbl_share.share_name, tbl_share.share_path,
tbl_file_structure.fk_file_id
-> Nested Loop (cost=8.48..114938.74 rows=3853
width=173) (actual time=185.588..5118.684 rows=730 loops=1)
-> Seq Scan on tbl_filetype (cost=0.00..1.25
rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1)
Filter: (83 = pk_filetype_id)
-> Hash Join (cost=8.48..114898.96 rows=3853
width=171) (actual time=185.538..5116.063 rows=730 loops=1)
Hash Cond: (tbl_archive.fk_share_id =
tbl_share.pk_share_id)
-> Nested Loop (cost=6.27..114843.77
rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1)
-> Nested Loop
(cost=6.27..32079.13 rows=1167 width=146) (actual
time=130.594..3448.927 rows=330 loops=1)
Join Filter:
(tbl_archive.pk_archive_id = tbl_structure.fk_archive_id)
-> Nested Loop
(cost=4.89..31185.00 rows=1167 width=138) (actual
time=119.861..3402.527 rows=330 loops=1)
-> Nested Loop
(cost=4.89..24413.49 rows=1167 width=56) (actual
time=82.203..2097.142 rows=330 loops=1)
-> Index Scan
using tbl_file_idx6 on tbl_file (cost=0.00..344.56 rows=114
width=40) (actual time=13.164..15.475 rows=106 loops=1)
Index Cond:
(fk_filetype_id = 83)
-> Bitmap Heap
Scan on tbl_file_structure (cost=4.89..210.27 rows=69 width=24)
(actual time=8.736..19.606 rows=3 loops=106)
Recheck
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
-> Bitmap
Index Scan on tbl_file_structure_idx (cost=0.00..4.88 rows=69
width=0) (actual time=2.574..2.574 rows=3 loops=106)
Index
Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id)
-> Index Scan using
tbl_structure_pkey on tbl_structure (cost=0.00..5.79 rows=1
width=98) (actual time=3.942..3.945 rows=1 loops=330)
Index Cond:
(tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id)
-> Materialize
(cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 rows=34
loops=330)
-> Seq Scan on
tbl_archive (cost=0.00..1.34 rows=34 width=16) (actual
time=10.606..10.693 rows=34 loops=1)
Filter:
archive_complete
-> Index Scan using tbl_acl_idx on
tbl_acl (cost=0.00..70.47 rows=36 width=8) (actual time=4.964..4.971
rows=2 loops=330)
Index Cond:
(tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id)
-> Hash (cost=2.16..2.16 rows=4
width=41) (actual time=17.890..17.890 rows=4 loops=1)
-> Hash Join (cost=1.07..2.16
rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1)
Hash Cond:
(tbl_share.fk_computer_id = tbl_computer.pk_computer_id)
-> Seq Scan on tbl_share
(cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4
loops=1)
-> Hash (cost=1.03..1.03
rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1)
-> Seq Scan on
tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual
time=17.697..17.706 rows=3 loops=1)
Total runtime: 5139.608 ms

> The default statistics target is 10; the maximum value is 1000.
>
> On some systems I've found that reducing random_page_cost from 4
> (the default) to 2 gives more realistic cost estimates for index
> scans.
>
> --
> Michael Fuhr

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Henrik Zagerholm 2007-03-23 13:13:57 Re: Query not using index pgsql 8.2.3
Previous Message Alban Hertroys 2007-03-23 12:34:44 Re: Query not using index pgsql 8.2.3