From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dann Corbit <DCorbit(at)connx(dot)com>, "Decibel!" <decibel(at)decibel(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Wong <markwkm(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Simple postgresql.conf wizard |
Date: | 2008-12-02 20:39:47 |
Message-ID: | 200812022039.mB2KdlQ06520@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Gregory Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > "Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> >> I also do not believe that there is any value that will be the right
> >> answer. But a table of data might be useful both for people who want to
> >> toy with altering the values and also for those who want to set the
> >> defaults. I guess that at one time such a table was generated to
> >> produce the initial estimates for default values.
> >
> > Sir, you credit us too much :-(. The actual story is that the current
> > default of 10 was put in when we first implemented stats histograms,
> > replacing code that kept track of only a *single* most common value
> > (and not very well, at that). So it was already a factor of 10 more
> > stats than we had experience with keeping, and accordingly conservatism
> > suggested not boosting the default much past that.
>
> I think that's actually too little credit. The sample size is chosen quite
> carefully based on solid mathematics to provide a specific confidence interval
> estimate for queries covering ranges the size of a whole bucket.
>
> The actual number of buckets more of an arbitrary choice. It depends entirely
> on how your data is distributed and how large a range your queries are
> covering. A uniformly distributed data set should only need a single bucket to
> generate good estimates. Less evenly distributed data sets need more.
>
> I wonder actually if there are algorithms for estimating the number of buckets
> needed for a histogram to achieve some measurable goal. That would close the
> loop. It would be much more reassuring to base the size of the sample on solid
> statistics than on hunches.
I have a few thoughts on this. First, people are correct that there is
no perfect default_statistics_target value. This is similar to the
problem with the pre-8.4 max_fsm_pages/max_fsm_relations, for which
there also was never a perfect value. But, if the FSM couldn't store
all the free space, a server log message was issued that recommended
increasing these values; the same is still done for
checkpoint_segments. Is there a way we could emit a server log message
to recommend increasing the statistics targets for specific columns?
Also, is there a way to increase the efficiency of the statistics
targets lookups? I assume the values are already sorted in the
pg_statistic arrays; do we already do a binary lookup on those? Does
that help?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2008-12-02 21:00:29 | Re: default_stats_target WAS: Simple postgresql.conf wizard |
Previous Message | Heikki Linnakangas | 2008-12-02 20:13:02 | Re: pg_stop_backup wait bug fix |