From: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: ANALYZE sampling is too good |
Date: | 2013-12-08 22:19:48 |
Message-ID: | 52A4F084.3070200@catalyst.net.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/12/13 08:03, Josh Berkus wrote:
>
> So there's a set of math designed to calculate for the skew introduced
> by reading *all* of the rows in each block. That's what I meant by
> "block-based sampling"; you read, say, 400 pages, you compile statistics
> on *all* of the rows on those pages, you apply some algorithms to adjust
> for groupings of rows based on how grouped they are. And you have a
> pretty good estimate of how grouped they are, because you just looked a
> complete sets of rows on a bunch of nonadjacent pages.
>
> Obviously, you need to look at more rows than you would with a
> pure-random sample. Like I said, the 80%+ accurate point in the papers
> seemed to be at a 5% sample. However, since those rows come from the
> same pages, the cost of looking at more rows is quite small, compared to
> the cost of looking at 64 times as many disk pages.
>
> My ACM subscription has lapsed, though; someone with a current ACM
> subscription could search for this; there are several published papers,
> with math and pseudocode.
>
This makes more sense! Unfortunately you had confused everyone (well me
at least) by decreeing that we needed block based sampling - when we
started using that in 2004 - there is more than one way to sample blocks
it seems :-)
What you are actually suggesting is a way to do block based sampling
that requires reading fewer blocks, and that is a great idea! Of course
as Greg has just suggested - the details are gonna be the clincher. Can
you supply authors or titles of papers?
Also with reference to Greenplum - their use of postgres is fairly
special case, and an approach that works well for them is not always
suitable for more general purpose use.
Regards
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2013-12-09 01:46:29 | Re: Possible work-around for 9.1 partial vacuum bug? |
Previous Message | Jim Nasby | 2013-12-08 21:33:39 | Possible work-around for 9.1 partial vacuum bug? |