Re: seqscan for 100 out of 3M rows, index present

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Willy-Bas Loos <willybas(at)gmail(dot)com>
Cc: Igor Neyman <ineyman(at)perceptron(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: seqscan for 100 out of 3M rows, index present
Date: 2013-06-26 20:59:43
Message-ID: CAMkU=1xoKeo94UYx3W3v8emPzUnkN6FcotYcaH8Ooyw1HZ6ASg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jun 26, 2013 at 12:18 PM, Willy-Bas Loos <willybas(at)gmail(dot)com> wrote:

> plan with enable_seqscan off:
>
> Aggregate (cost=253892.48..253892.49 rows=1 width=0) (actual
> time=208.681..208.681 rows=1 loops=1)
>

The estimated cost of this is ~4x times greater than the estimated cost for
the sequential scan. It should be easy to tweak things to get those to
reverse, but will doing so mess up other queries that are currently OK?

> -> Nested Loop (cost=5.87..253889.49 rows=1198 width=0) (actual
> time=69.403..208.647 rows=17 loops=1)
>

The estimated number of rows is off by 70 fold. Most of this is probably
due to cross-column correlations, which you probably can't do much about.

> -> Index Scan using geo_blok_idx on geo g (cost=0.00..1314.43
> rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
> Index Cond: (blok = 1942)
>

It thinks it will find 500 rows (a suspiciously round number?) but actually
finds 121. That is off by a factor of 4. Why does it not produce a better
estimate on such a simple histogram-based estimation? Was the table
analyzed recently? Have you tried increasing default_statistics_target?
If you choose values of blok other than 1942, what are the results like?
This estimate feeds into the inner loop estimates multiplicatively, so
this is a powerful factor in driving the choice.

> -> Bitmap Heap Scan on bmp_data d (cost=5.87..502.91 rows=179
> width=8) (actual time=1.340..1.341 rows=0 loops=121)
> Recheck Cond: (geo_id = g.geo_id)
> -> Bitmap Index Scan on bmp_data_geo_idx (cost=0.00..5.82
> rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
> Index Cond: (geo_id = g.geo_id)
> Total runtime: 208.850 ms
>
>
So it is only twice as fast as the sequential scan anyway. Were you
expecting even more faster? Unless it is the dominant query in your
database, I would usually not consider a factor of 2 improvement to be
worth worrying about, as it is too likely you will make something else
worse in the process.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Willy-Bas Loos 2013-06-26 21:25:15 Re: seqscan for 100 out of 3M rows, index present
Previous Message Sergey Konoplev 2013-06-26 20:55:13 Re: seqscan for 100 out of 3M rows, index present