Re: Gsoc2012 idea, tablesample

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, josh(at)agliodbs(dot)com, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, ants(at)cybertec(dot)at, heikki(dot)linnakangas(at)enterprisedb(dot)com, cbbrowne(at)gmail(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, huangqiyx(at)hotmail(dot)com, "Florian Pflug" <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net
Subject: Re: Gsoc2012 idea, tablesample
Date: 2012-05-11 15:42:21
Message-ID: 5527.1336750941@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The trouble is, AFAICS, that you can't bound M very well without
>> scanning the whole table. I mean, it's bounded by theoretical
>> limit, but that's it.

> What would the theoretical limit be? (black size - page header size
> - minimum size of one tuple) / item pointer size? So, on an 8KB
> page, somewhere in the neighborhood of 1350?

Your math is off --- I get something less than 300, even if the tuples
are assumed to be empty of data. (Header size 24 bytes, plus 4-byte
line pointer, so at least 28 bytes per tuple, so at most 292 on an 8K
page.) But you still end up probing just about every page for a 1%
sample.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-11 15:44:00 Re: incorrect handling of the timeout in pg_receivexlog
Previous Message Kevin Grittner 2012-05-11 15:36:16 Re: Gsoc2012 idea, tablesample