Re: BUG #8598: Row count estimates of partial indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #8598: Row count estimates of partial indexes
Date: 2013-11-17 21:06:17
Message-ID: CAMkU=1wdiLuJ5e9mOD8h_tVxXj45=6DOeRPd-MCkT42CV3_44Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Nov 17, 2013 at 11:55 AM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:

> On 11/17/13, 5:29 PM, Tom Lane wrote:
>
>> marko(at)joh(dot)to writes:
>>
>>> We have the following partial index on a small subset of a larger table:
>>> "index_transactions_transaction_balance_details" btree
>>> (transactionid)
>>> WHERE NOT processed AND accountbalancesdailyid IS NOT NULL
>>> However, querying with the WHERE clause completely ignores the
>>> pg_class.reltuples value for the index:
>>>
>>
>> Yup. Row count estimates are derived by estimating the selectivity of the
>> given WHERE clauses and multiplying by the (estimated) current table size.
>> In the particular case you show here, with a partial index that *exactly*
>> matches the WHERE clause, we could get a better answer by looking at the
>> index size --- but that doesn't scale to any less simplistic case, such
>> as a query with additional WHERE clauses.
>>
>> It's also important to realize that reltuples for an index is a whole lot
>> less trustworthy than it is for a table; ANALYZE doesn't update the
>> former, for example. And scaling from the last-reported VACUUM stats
>> to current reality is going to be shakier.
>>
>> So on the whole, I don't think this would be a good idea.
>>
>
> Any suggestions for a workaround? When reading this index as a part of a
> bigger query the horrible estimate ensures that nobody's having fun. I
> currently have something like:
>

Define a new column which is true iff the where condition is true? It
sounds like that one magic combination has a meaning all of its own, so it
would make sense to encode it in one column.

>
> SELECT * FROM
> (SELECT * FROM transactions WHERE <partial index> LIMIT 25000)
> transactions
> JOIN ..
>
> And I *really* don't like that as a workaround.
>

I've wanted a function that always returns true, but which the planner
things returns false most of the time, for use in such situations. It
looks like you can make one of these with a compiled module (by creating an
operator and then wrapping that in a function), but I have not found a way
to do it without using C. (CREATE FUNCTION takes a COST and ROWS, but not
a SELECTIVITY.)

Cheers,

Jeff

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Marko Tiikkaja 2013-11-18 10:05:04 Re: BUG #8598: Row count estimates of partial indexes
Previous Message Tom Lane 2013-11-17 20:18:39 Re: BUG #8598: Row count estimates of partial indexes