From: | Florian Pflug <fgp(at)phlo(dot)org> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, ants(at)cybertec(dot)at, heikki(dot)linnakangas(at)enterprisedb(dot)com, cbbrowne(at)gmail(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, huangqiyx(at)hotmail(dot)com, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net |
Subject: | Re: Gsoc2012 idea, tablesample |
Date: | 2012-05-11 15:45:18 |
Message-ID: | 6340B96F-19A0-4894-9D83-5F895BE41659@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On May11, 2012, at 17:20 , Robert Haas wrote:
> On Fri, May 11, 2012 at 11:04 AM, Kevin Grittner
>> Since T cancels out of that equation, we don't need to worry about
>> estimating it. Results will be correct for any value of M which is
>> *at least* as large as the maximum tuple index in the table,
>> although values of M larger than that will degrade performance. The
>> same holds true for the number of pages.
>
> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table. I mean, it's bounded by theoretical limit,
> but that's it.
Hm, but maybe Kevin's observation that the actual value of M shouldn't
matter as long as it's large enough helps here. What if you start out
with M=1 and generate your first TID. After reading the page, but before
returning a tuple, you check if M is indeed an upper bound on the tuple
indices. If it isn't, you increase M, recompute N (the number of probes),
determine a new random tuple index, and return the tuple (if it is live).
Would that introduce bias? I'd think not, because scaling up N shouldn't,
per Kevin's argument, change the probability of a TID being picked. So
increasing N in the middle of a scan should neither penalize nor favour
tuples which were already returned compared to those which will follow,
no?
(And yes, even if there don't turn out to be any obvious holes in this
argument, it requires more formal proof that I was able to give here
before being turned into code. Or at the very least excessive testing
which all kinds of data)
best regards,
Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2012-05-11 15:50:37 | Re: Gsoc2012 idea, tablesample |
Previous Message | Andrew Dunstan | 2012-05-11 15:44:49 | Re: Draft release notes complete |