Re: Why isn't an index scan being used?

From: Abi Noda <a(at)abinoda(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, andrew(at)tao11(dot)riddles(dot)org(dot)uk
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Why isn't an index scan being used?
Date: 2019-02-20 04:58:43
Message-ID: CAM37AMO6vYESa6rrY28L2NPXoEogFUSeLEqsGUz9JFC17GRxFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Justin.

The 4ms different in the examples isn't an accurate benchmark. I'm seeing
about a ~20% difference over a larger sample size. And this is on a fork of
the production database.

Apart from the end-performance, I'm motivated to figure out why one index
results in an index scan whereas the other one does not.

I didn't mention this in my original email but I've separately tested
dropping the `state` index, running VACUUM FULL on the table, then
recreating both indexes. The result was the same where querying on state
produced an index scan whereas closed produced a bitmap scan.

Andrew's email and Michael's follow-up has me curious because it suggests
I'm running into a issue specific to indexing on IS NULL, @Justin what do
you think of this?

In the meantime Justin I'll investigate some more of your suggestions.

On Tue, Feb 19, 2019 at 9:37 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Tue, Feb 19, 2019 at 05:10:43PM -0700, Abi Noda wrote:
> > I have a table as defined below. The table contains 1,027,616 rows,
> 50,349
> > of which have state='open' and closed IS NULL. Since closed IS NULL for
> all
> > rows where state='open', I want to remove the unnecessary state column.
> >
> > CREATE TABLE tickets (
> > id bigserial primary key,
> > state character varying,
> > closed timestamp,
> ...
> > );
> >
> > CREATE INDEX "state_index" ON "tickets" ("state") WHERE ((state)::text =
> > 'open'::text));
> >
> > As part of the process of removing the state column, I am trying to index
> > the closed column so I can achieve equal query performance (index scan)
> as
> > when I query on the state column as shown below:
> >
> > EXPLAIN ANALYZE select title, created, closed, updated from tickets
> where state = 'open';
> > Index Scan using state_index on tickets (cost=0.29..23430.20 rows=50349
> width=64) (actual time=17.221..52.110 rows=51533 loops=1)
> >
> > However, when I index the closed column, a bitmap scan is used instead of
> > an index scan, with slightly slower performance. Why isn't an index scan
> > being used, given that the exact same number of rows are at play as in my
> > query on the state column? How do I index closed in a way where an index
> > scan is used?
> >
> > CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;
> > EXPLAIN ANALYZE select title, created, closed, updated from tickets
> where closed IS NULL;
> > Bitmap Heap Scan on tickets (cost=824.62..33955.85 rows=50349 width=64)
> (actual time=10.420..56.095 rows=51537 loops=1)
> > -> Bitmap Index Scan on closed_index (cost=0.00..812.03 rows=50349
> width=0) (actual time=6.005..6.005 rows=51537 loops=1)
>
> Are you really concerned about 4ms ? If this is a toy-sized test system,
> please try on something resembling production, perhaps by loading
> production or
> fake data, or perhaps on a production system within a transactions (begin;
> CREATE
> INDEX CONCURRENTLY; explain ...; rollback).
>
> You can see that most of the estimated cost is from the table (the index
> scan
> accounts for only 812 of total 33955 cost units). So I'm guessing the
> planner
> thinks that an index scan will either 1) access the table randomly;
> and/or, 2)
> access a large fraction of the table.
>
> If it was just built, the first (partial/conditional/predicate/where) index
> will scan table in its "physical" order (if not sequentially).
>
> The 2nd index is going to scan table in order of ID, which I'm guessing is
> not
> "correlated" with its physical order, so an index scan cost is computed as
> accessing a larger fraction of the table (but by using an "bitmap" scan
> it's at
> least in physical order). In fact: 50349/17478 = ~3 tuples/page is low, so
> you're accessing a large fraction of the table to return a small fraction
> of
> its tuples.
>
> You can check what it thinks here:
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram
>
> You could try CLUSTERing the table on ID (which requires a non-partial
> index)
> and ANALYZEing (which might cause this and other queries to be planned
> and/or
> perform differently). That causes the table to be locked exclusively.
> Then,
> the planner knows that scanning index and returning results ordered by IDs
> (which doesn't matter) will also access table in physical order (which
> matters), and maybe fewer pages need to be read, too.
>
> Justin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Pryzby 2019-02-20 08:05:00 Re: Why isn't an index scan being used?
Previous Message Justin Pryzby 2019-02-20 04:37:48 Re: Why isn't an index scan being used?