Re: AW: Call for alpha testing: planner statistics revision s

From: Alex Pilosov <alex(at)pilosoft(dot)com>
To: Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: AW: Call for alpha testing: planner statistics revision s
Date: 2001-06-18 13:16:22
Message-ID: Pine.BSO.4.10.10106180911560.8898-100000@spider.pilosoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 18 Jun 2001, Zeugswetter Andreas SB wrote:

> First of all thanks for the great effort, it will surely be appreciated :-)
>
> > * On large tables, ANALYZE uses a random sample of rows rather than
> > examining every row, so that it should take a reasonably short time
> > even on very large tables. Possible downside: inaccurate stats.
> > We need to find out if the sample size is large enough.
>
> Imho that is not optimal :-) ** ducks head, to evade flying hammer **
> 1. the random sample approach should be explicitly requested with some
> syntax extension
> 2. the sample size should also be tuneable with some analyze syntax
> extension (the dba chooses the tradeoff between accuracy and runtime)
> 3. if at all, an automatic analyze should do the samples on small tables,
> and accurate stats on large tables
>
> The reasoning behind this is, that when the optimizer does a "mistake"
> on small tables the runtime penalty is small, and probably even beats
> the cost of accurate statistics lookup. (3 page table --> no stats
> except table size needed)
I disagree.

As monte carlo method shows, _as long as you_ query random rows, your
result will be sufficiently close to the real statistics. I'm not sure if
I can find math behind this, though...

-alex

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Mahoney 2001-06-18 13:22:17 Re: POSTMASTER
Previous Message The Hermit Hacker 2001-06-18 12:51:32 Re: Doc translation