From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Improving N-Distinct estimation by ANALYZE |
Date: | 2006-01-05 19:40:19 |
Message-ID: | 200601051140.20156.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg,
> We *currently* use a block based sampling algorithm that addresses this
> issue by taking care to select rows within the selected blocks in an
> unbiased way. You were proposing reading *all* the records from the
> selected blocks, which throws away that feature.
The block-based algorithms have specific math to cope with this. Stuff
which is better grounded in statistical analysis than our code. Please
read the papers before you judge the solution.
> Worse, my recollection from the paper I mentioned earlier was that
> sampling small percentages like 3-5% didn't get you an acceptable
> accuracy. Before you got anything reliable you found you were sampling
> very large percentages of the table. And note that if you have to sample
> anything over 10-20% you may as well just read the whole table. Random
> access reads are that much slower.
Right, which is why researchers are currently looking for something better.
The Brutlag & Richardson claims to be able to produce estimates which are
within +/- 3x 90% of the time using a 5% sample, which is far better than
our current accuracy. Nobody claims to be able to estimate based on <
0.1% of the table, which is what Postgres tries to do for large tables.
5% based on block-based sampling is reasonable; that means a straight 5% of
the on-disk size of the table, so 5gb for a 100gb table. With random-row
sampling, that would require as much as 25% of the table, making it easier
to just scan the whole thing.
--
--Josh
Josh Berkus
Aglio Database Solutions
San Francisco
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2006-01-05 19:44:24 | Re: [HACKERS] Inconsistent syntax in GRANT |
Previous Message | Bruce Momjian | 2006-01-05 19:06:34 | Re: [HACKERS] Inconsistent syntax in GRANT |