From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: max_wal_senders must die |
Date: | 2010-10-20 19:40:52 |
Message-ID: | AANLkTimqyhbNpw+E1JKciFeESovszoWe6S=q1Xn1_NfW@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Exactly. It doesn't take many 3-7% slowdowns to add up to being 50%
> or 100% slower, and that sucks. In fact, I'm still not convinced that
> we were wise to boost default_statistics_target as much as we did. I
> argued for a smaller boost at the time.
Well we don't want to let ourselves be paralyzed by FUD so it was
important to identify specific concerns and then tackle those
concerns. Once we identified the worst-case planning cases we profiled
them and found that the inflection point of the curve was fairly
clearly above 100 but that there were cases where values below 1,000
caused problems. So I'm pretty happy with the evidence-based approach.
The problem with being overly conservative is that it gives free rein
to the folks who were shouting that we should just set the default to
1,000. They weren't wrong that the 10 was overly conservative and in
the absence of evidence 1,000 was just as reasonable.
> Actually, I think the best thing for default_statistics_target might
> be to scale the target based on the number of rows in the table, e.g.
> given N rows:
The number of buckets needed isn't related to the population size --
it's related to how wide the ranges you'll be estimating selectivity
for are. That is, with our current code, if you're selecting tuples
within a range a..b and that range happens to be the same size as the
bucket size then you'll get an accurate estimate with a fixed 95th
percentile precision regardless of the size of the table (to an
approximation).
I'm not sure how our selectivity works at all for the degenerate case
of selecting for specific values. I don't understand how histograms
are useful for such estimates at all. I think the MCV lists are
basically an attempt to overcome this problem and as you point out I'm
not sure the statistics target is really the right thign to control
them -- but since I don't think there's any real statistics behind
them I'm not sure there's any right way to control them.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | daveg | 2010-10-20 19:47:18 | Re: PostgreSQL and HugePage |
Previous Message | Alvaro Herrera | 2010-10-20 19:39:09 | Re: PostgreSQL and HugePage |