From: | Zeugswetter Andreas SB <ZeugswetterA(at)wien(dot)spardat(dot)at> |
---|---|
To: | "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | AW: Call for alpha testing: planner statistics revision s |
Date: | 2001-06-18 09:43:17 |
Message-ID: | 11C1E6749A55D411A9670001FA68796336832B@sdexcsrv1.f000.d0188.sd.spardat.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
First of all thanks for the great effort, it will surely be appreciated :-)
> * On large tables, ANALYZE uses a random sample of rows rather than
> examining every row, so that it should take a reasonably short time
> even on very large tables. Possible downside: inaccurate stats.
> We need to find out if the sample size is large enough.
Imho that is not optimal :-) ** ducks head, to evade flying hammer **
1. the random sample approach should be explicitly requested with some
syntax extension
2. the sample size should also be tuneable with some analyze syntax
extension (the dba chooses the tradeoff between accuracy and runtime)
3. if at all, an automatic analyze should do the samples on small tables,
and accurate stats on large tables
The reasoning behind this is, that when the optimizer does a "mistake"
on small tables the runtime penalty is small, and probably even beats
the cost of accurate statistics lookup. (3 page table --> no stats
except table size needed)
When on the other hand the optimizer does a "mistake" on a huge table
the difference is easily a matter of hours, thus you want accurate stats.
Because we do not want the dba to decide which statistics are optimal,
there should probably be an analyze helper application that is invoked
with "vacuum analyze database optimal" or some such, that also decides
whether a table was sufficiently altered to justify new stats gathering
or vacuum. The decision, what to do may also be based on a runtime limit,
that the dba specifies ("do the most important stats/vacuums you can do
within ~3 hours").
These points are also based on experience with huge SAP/R3 installations
and the way statistics are gathered there.
Andreas
From | Date | Subject | |
---|---|---|---|
Next Message | Zeugswetter Andreas SB | 2001-06-18 09:51:49 | AW: timestamp with/without time zone |
Previous Message | Zeugswetter Andreas SB | 2001-06-18 08:35:33 | AW: RE: Row Versioning, for jdbc updateable result sets |