From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | sthomas(at)optionshouse(dot)com |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Setting Statistics on Functional Indexes |
Date: | 2012-10-26 21:08:18 |
Message-ID: | 3040.1351285698@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Shaun Thomas <sthomas(at)optionshouse(dot)com> writes:
> But I just noticed the lag in your response. :) It turns out, even
> though I was substituting 2012-10-24 or 2012-10-25, what I really meant
> was current_date. That does make all the difference, actually.
Ah. [ pokes at that for awhile... ] OK, this has nothing to do with
functional indexes, and everything to do with the edge-case behavior of
scalarltsel. What you've got is a histogram whose last entry
(corresponding to the highest observed value of the date) is
current_date, and the question is what we should assume when estimating
how many rows have a value >= that. The answer of course is "one, plus
any duplicates" ... but we don't know how many duplicates there are,
and what we do know is it's not a particularly large number because the
value isn't present in the most-common-values stats. So the code there
assumes there aren't any dups.
Once you have enough histogram resolution for current_date to show up
as the next-to-last as well as the last histogram entry, then of course
the estimate gets a lot better, since we can now tell that there's at
least one histogram bin's worth of duplicates.
Interestingly, this is a case where the get_actual_variable_range patch
(commit 40608e7f, which appeared in 9.0) makes the results worse.
Before that, there was a (very arbitrary) lower bound on what we'd
believe as the selectivity of a >= condition, but now, when we know the
actual upper limit of the variable, we don't clamp the result that way.
I think the clamp must have been saving you in your previous version,
because it more-or-less-accidentally accounted for the end value not
being unique.
So the bottom line is that this is a case where you need a lot of
resolution in the histogram. I'm not sure there's anything good
we can do to avoid that. I spent a bit of time thinking about whether
we could use n_distinct to get some idea of how many duplicates there
might be for the endpoint value, but n_distinct is unreliable enough
that I can't develop a lot of faith in such a thing. Or we could just
arbitarily assume some fraction-of-a-histogram-bin's worth of
duplicates, but that would make the results worse for some people.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Claudio Freire | 2012-10-26 21:19:05 | Re: Setting Statistics on Functional Indexes |
Previous Message | robcron | 2012-10-26 20:56:56 | Re: Slower Performance on Postgres 9.1.6 vs 8.2.11 |