Re: Index scan cost calculation

From: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index scan cost calculation
Date: 2015-11-26 17:50:55
Message-ID: 412855216.18274544.1448560255082.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

----- Original Message -----

> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
> Cc: Pgsql-performance <pgsql-performance(at)postgresql(dot)org>
> Sent: Thursday, 26 November 2015, 16:44
> Subject: Re: [PERFORM] Index scan cost calculation
>
> Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk> writes:
>> Using pg 9.4.5 I'm looking at a table set up by a 3rd party application
> and trying to figure out why a particular index is being chosen over another for
> updates/deletes.
>> From what I can see the reason is that plans using either index have the
> same exactly the same cost. So rather I'm asking if there's something
> glaringly obvious I'm missing, or is there anything I can to to get better
> estimates.
>
> I think what's happening is that it's estimating that exactly one index
> tuple needs to be visited in both cases, so that the cost estimates come
> out the same. That's correct in the one case but overly optimistic in the
> other; the misestimate likely is a consequence of the index columns being
> interdependent. For instance, if "type" can be predicted from the
> other
> columns then specifying it isn't really adding anything to the query
> selectivity, but the planner won't know that. We can conclude from the
> results you've shown that the planner thinks that show+type+best+block
> is sufficient to uniquely determine a table entry, which implies that
> at least some of those columns are strongly correlated with row+seat.
>
> The problem will probably go away by itself as your table grows, but
> if you don't want to wait, you might want to reflect on which of the index
> columns might be (partially?) functionally dependent on the other columns,
> and whether you could redesign the key structure to avoid that.

Many thanks for the explanation, is such a functional dependency assumed purely based optimistically on statistics gathered by analyze? My (ignorant) thinking was that those sorts of decisions would only be made from keys or constraints on the table.

There's no way to determine a particular seat+row combination from show+type+best+block or vice versa.

We need show+type+best+block+row+seat to identify an individual row, but approximately 90% of the table has just a space " " for the value of "block", and zeros for both "best" and "row", and for each of those you could say any show+type would almost certainly have row+seat combinations of 0+1, 0+2 and so on.

Unfortunately it's an unnormalized legacy structure that I can't really change.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2015-11-26 18:26:27 Re: Index scan cost calculation
Previous Message Tom Lane 2015-11-26 16:44:51 Re: Index scan cost calculation