From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | "Florian Pflug" <fgp(at)phlo(dot)org>, josh(at)agliodbs(dot)com, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, ants(at)cybertec(dot)at, heikki(dot)linnakangas(at)enterprisedb(dot)com, cbbrowne(at)gmail(dot)com, neil(dot)conway(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, daniel(at)heroku(dot)com, huangqiyx(at)hotmail(dot)com, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net |
Subject: | Re: Gsoc2012 idea, tablesample |
Date: | 2012-05-11 14:27:44 |
Message-ID: | 3891.1336746464@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> Maybe one can get rid of these sorts of problems by factoring in
>> the expected density of the table beforehand and simply accepting
>> that the results will be inaccurate if the statistics are
>> outdated?
> Unless I'm missing something, I think that works for percentage
> selection, which is what the standard talks about, without any need
> to iterate through addition samples. Good idea! We don't need to
> do any second pass to pare down initial results, either. This
> greatly simplifies coding while providing exactly what the standard
> requires.
>> I'm not totally sure whether this approach is sensible to
>> non-uniformity in the tuple to line-pointer assignment, though.
If you're willing to accept that the quality of the results depends on
having up-to-date stats, then I'd suggest (1) use the planner's existing
technology to estimate the number of rows in the table; (2) multiply
by sampling factor you want to get a desired number of sample rows;
(3) use ANALYZE's existing technology to acquire that many sample rows.
While the ANALYZE code isn't perfect with respect to the problem of
nonuniform TID density, it certainly will be a lot better than
pretending that that problem doesn't exist.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-05-11 14:36:31 | Re: Draft release notes complete |
Previous Message | Michael Nolan | 2012-05-11 14:21:37 | Re: problem/bug in drop tablespace? |