From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, ants(at)cybertec(dot)at, heikki(dot)linnakangas(at)enterprisedb(dot)com, cbbrowne(at)gmail(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, huangqiyx(at)hotmail(dot)com, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net |
Subject: | Re: Gsoc2012 idea, tablesample |
Date: | 2012-05-11 15:20:07 |
Message-ID: | CA+TgmoaQ6w=VcEwwVbBkN7GH4eVvaPEVfLKioHZbXcezgG4Eig@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> We
> will be probing random page numbers 1..P for random tuple indexes
> 1..M. So how many random probes by ctid does that require? The
> chance of a hit on each probe is ((T/P)/M) -- the average number of
> tuples per page divided by the number of tuple index values allowed.
> So we need (S*T)/((T/P)/M) probes. Simplifying that winds up being
> S*M*P the product of the sample size as a percentage, the maximum
> tuple index on a page, and the number of pages. (A calculation some
> may have jumped to as intuitively obvious.)
>
> So let's call the number of probes N. We randomly generate N
> distinct ctid values, where each is a random page number 1..P and a
> random index 1..M. We attempt to read each of these in block number
> order, not following HOT chains. For each, if the tuple exists and
> is visible, it is part of our result set.
>
> Since T cancels out of that equation, we don't need to worry about
> estimating it. Results will be correct for any value of M which is
> *at least* as large as the maximum tuple index in the table,
> although values of M larger than that will degrade performance. The
> same holds true for the number of pages.
The trouble is, AFAICS, that you can't bound M very well without
scanning the whole table. I mean, it's bounded by theoretical limit,
but that's it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-05-11 15:35:28 | Re: Gsoc2012 idea, tablesample |
Previous Message | Kevin Grittner | 2012-05-11 15:04:46 | Re: Gsoc2012 idea, tablesample |