From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
Cc: | Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: query slows down with more accurate stats |
Date: | 2004-04-19 16:00:10 |
Message-ID: | 25762.1082390410@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> Random sampling is more like "every possible sample is equally likely to
> be collected", and two-stage sampling doesn't satisfy this condition.
Okay, I finally see the point here: in the limit as the number of pages
B goes to infinity, you'd expect the probability that each tuple in your
sample came from a different page to go to 1. But this doesn't happen
in the two-stage sampling method: the probability doesn't increase
beyond the value it would have for B=n. On the average each sample page
would supply one tuple, but the odds that this holds *exactly* would be
pretty low.
However the existing sampling method has glaring flaws of its own,
in particular having to do with the fact that a tuple whose slot is
preceded by N empty slots is N times more likely to be picked than one
that has no empty-slot predecessors. The fact that the two-stage
method artificially constrains the sample to come from only n pages
seems like a minor problem by comparison; I'd happily accept it to get
rid of the empty-slot bias.
A possible compromise is to limit the number of pages sampled to
something a bit larger than n, perhaps 2n or 3n. I don't have a feeling
for the shape of the different-pages probability function; would this
make a significant difference, or would it just waste cycles?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2004-04-19 16:01:09 | Re: signal 11 on AIX: 7.4.2 |
Previous Message | Fabien COELHO | 2004-04-19 15:44:38 | Re: GUC variable set, TODO |
From | Date | Subject | |
---|---|---|---|
Next Message | Litao Wu | 2004-04-19 16:26:03 | Re: sunquery and estimated rows |
Previous Message | Anjan Dave | 2004-04-19 13:52:39 | Re: Wierd context-switching issue on Xeon |