From: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> |
---|---|
To: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
Cc: | <josh(at)agliodbs(dot)com>,<andres(at)anarazel(dot)de>, <alvherre(at)commandprompt(dot)com>, <ants(at)cybertec(dot)at>, <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <cbbrowne(at)gmail(dot)com>, <neil(dot)conway(at)gmail(dot)com>, <daniel(at)heroku(dot)com>, <huangqiyx(at)hotmail(dot)com>, "Florian Pflug" <fgp(at)phlo(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, <sfrost(at)snowman(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Gsoc2012 idea, tablesample |
Date: | 2012-05-11 15:36:16 |
Message-ID: | 4FACEBA00200002500047BA4@gw.wicourts.gov |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The trouble is, AFAICS, that you can't bound M very well without
> scanning the whole table. I mean, it's bounded by theoretical
> limit, but that's it.
What would the theoretical limit be? (black size - page header size
- minimum size of one tuple) / item pointer size? So, on an 8KB
page, somewhere in the neighborhood of 1350? Hmm. If that's right,
that would mean a 1% random sample would need 13.5 probes per page,
meaning there wouldn't tend to be a lot of pages missed. Still, the
technique for getting a random sample seems sound, unless someone
suggests something better. Maybe we just want to go straight to a
seqscan to get to the pages we want to probe rather than reading
just the ones on the "probe list" in physical order?
-Kevin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-05-11 15:42:21 | Re: Gsoc2012 idea, tablesample |
Previous Message | Tom Lane | 2012-05-11 15:35:28 | Re: Gsoc2012 idea, tablesample |