From: | Alex Pilosov <alex(at)pilosoft(dot)com> |
---|---|
To: | Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at> |
Cc: | "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: AW: Call for alpha testing: planner statistics revision s |
Date: | 2001-06-18 13:16:22 |
Message-ID: | Pine.BSO.4.10.10106180911560.8898-100000@spider.pilosoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 18 Jun 2001, Zeugswetter Andreas SB wrote:
> First of all thanks for the great effort, it will surely be appreciated :-)
>
> > * On large tables, ANALYZE uses a random sample of rows rather than
> > examining every row, so that it should take a reasonably short time
> > even on very large tables. Possible downside: inaccurate stats.
> > We need to find out if the sample size is large enough.
>
> Imho that is not optimal :-) ** ducks head, to evade flying hammer **
> 1. the random sample approach should be explicitly requested with some
> syntax extension
> 2. the sample size should also be tuneable with some analyze syntax
> extension (the dba chooses the tradeoff between accuracy and runtime)
> 3. if at all, an automatic analyze should do the samples on small tables,
> and accurate stats on large tables
>
> The reasoning behind this is, that when the optimizer does a "mistake"
> on small tables the runtime penalty is small, and probably even beats
> the cost of accurate statistics lookup. (3 page table --> no stats
> except table size needed)
I disagree.
As monte carlo method shows, _as long as you_ query random rows, your
result will be sufficiently close to the real statistics. I'm not sure if
I can find math behind this, though...
-alex
From | Date | Subject | |
---|---|---|---|
Next Message | Ryan Mahoney | 2001-06-18 13:22:17 | Re: POSTMASTER |
Previous Message | The Hermit Hacker | 2001-06-18 12:51:32 | Re: Doc translation |