Re: Strange choice of general index over partial index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Strange choice of general index over partial index
Date: 2015-01-15 23:39:36
Message-ID: CAMkU=1yDabjJSDZQtwc8QZ71tUxLs-RWTdAE8Q58-ugVmzK_TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Jan 15, 2015 at 2:30 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> This is an obfuscation and mock up, but:
>
> table files (
> id serial pk,
> filename text not null,
> state varchar(20) not null
> ... 18 more columns
> )
>
> index file_state on (state)
> (35GB in size)
> index file_in_flight_state (state) where state in (
> 'waiting','assigning', 'processing' )
> (600MB in size)
> ... 10 more indexes
>
> More important facts:
> * state = 'done' 95% of the time. thereform the partial index
> represents only 5% of the table
> * all indexes and the table are very bloated
> * server has 128GB RAM
> * Version 9.2.
>
> Given this setup, I would expect the planner to *always* choose
> file_in_flight_state over file_state for this query:
>
> SELECT id, filename FROM files WHERE state = 'waiting';
>
> ... and yet it keeps selecting file_state based on extremely small
> changes to the stats. This is important because the same query, using
> file_state, is 20X to 50X slower, because that index frequently gets
> pushed out of memory.
>
> What am I missing? Or is this potentially a planner bug for costing?
>

I wonder if this could be related to 3e9960e9d935e7e7c12e78441, which first
appeared in 9.2.3.

But I don't know why the small index *should* be better. If this query is
frequent, it should have no problem keeping just those leaf pages that
contain the 'waiting' rows out of the full index in memory, without having
to keep the 'done' leaf pages around. And if it is not frequent, then it
would have just as much problem keeping the smaller index in memory as it
would a small portion of the large index.

Of course if it randomly switches back and forth, now you have to keep
twice as much data in memory, the relevant parts of both indexes.

What is the point of having the full index at all, in this case?

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2015-01-16 00:37:06 Re: Strange choice of general index over partial index
Previous Message Tom Lane 2015-01-15 23:24:05 Re: shared_buffers vs Linux file cache