Re: Idea for the statistics collector

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Doug Fields <dfields(at)pexicom(dot)com>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Idea for the statistics collector
Date: 2005-04-20 02:44:57
Message-ID: 200504200244.j3K2iva26680@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Added to TODO list:

* Log queries where the optimizer row estimates were dramatically
different from the number of rows actually found (?)

---------------------------------------------------------------------------

Doug Fields wrote:
>
> >Tom Lane wrote:
> > > Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > > > Martijn van Oosterhout wrote:
> > > >> Firstly, I was only thinking of going for the basic nodes (Index
> > Scan, Seq
> > > >> Scan, Distinct). Other types have far more variables. Secondly, even
> > if you
> > > >> only count, it's useful. For example, if it tells you that the
> > planner is
> > > >> off by a factor of 10 more than 75% of the time, that's useful
> > information
> > > >> independant of what the actual variables are.
> > >
> > > And reduce the usefulness even more ;-). As a pure stats-gathering
> > > exercise it might be worth doing, but not if you only log the failure
> > > cases. How will you know how well you are doing if you take a
> > > biased-by-design sample?
>
> Personally, given that it seems like at least once or twice a day someone
> asks about performance or "why isn't my index being used" and other stuff -
> I think doing this would be a great idea.
>
> Perhaps not necessarily in the full-fledged way, but creating a sort of
> "ANALYZE log," wherein it logs the optimizer's estimate of a query and the
> actual results of a query, for every query. This, of course, could be
> enableable/disableable on a per-connection basis, per-table basis (like
> OIDs), or whatever other basis makes life easiest to the developers.
>
> Then, when the next ANALYZE is run, it could do it's usual analysis, and
> apply some additional heuristics based upon what it learns from the
> "ANALYZE log," possibly to do several things:
>
> 1) Automatically increase/decrease the SET STATISTICS information included
> in the analyze, for example, increasing it as a table grows larger and the
> "randomness" grows less than linearly with size (e.g., if you have 50 or 60
> groups in a 1,000,000 row table, that certainly needs a higher SET
> STATISTICS and I do it on my tables).
> 2) Have an additional value on the statistics table called the
> "index_heuristic" or "random_page_adjustment_heuristic" which when 1 does
> nothing, but otherwise modifies the cost of using an index/seq scan by that
> factor - and don't ever change this more than a few percent each ANALYZE
> 3) Flags in a second log (maybe the regular log) really bad query estimates
> - let it do an analysis of the queries and flag anything two or three std
> deviations outside.
>
> Now, I suggest all this stuff in the name of usability and
> self-maintainability. Unfortunately, I don't have the wherewithal to
> actually assist in development.
>
> Another possibility is to put "use_seq_scan" default to OFF, or whatever
> the parameter is (I did my optimizing a while ago so it's fading), so that
> if there's an index, it will use it, regardless - as this seems to be what
> the great majority of people expect to happen. And/or add this to a FAQ,
> and let us all reply "see http://.../indexfaq.html." :)
>
> Cheers,
>
> Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2005-04-20 02:48:08 Re: [GENERAL] Idea for the statistics collector
Previous Message Lorenzo Thurman 2005-04-20 01:29:32 Re: pg_dump fails

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-04-20 02:48:08 Re: [GENERAL] Idea for the statistics collector
Previous Message Jim C. Nasby 2005-04-19 22:55:54 Re: argtype_inherit() is dead code