Quick Links

Re: Gsoc2012 idea, tablesample

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, josh(at)agliodbs(dot)com, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, ants(at)cybertec(dot)at, heikki(dot)linnakangas(at)enterprisedb(dot)com, cbbrowne(at)gmail(dot)com, neil(dot)conway(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, daniel(at)heroku(dot)com, huangqiyx(at)hotmail(dot)com, pgsql-hackers(at)postgresql(dot)org, sfrost(at)snowman(dot)net
Subject:	Re: Gsoc2012 idea, tablesample
Date:	2012-05-11 14:13:17
Message-ID:	3460.1336745597@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> This all hinges on the ability to produce a sufficient accurate estimate of the
> TID density p_tup/p_tid, of course.

I think that's the least of its problems. AFAICS this analysis ignores
(1) the problem that the TID space is nonuniform, ie we don't know how
many tuples in each page until we look;
(2) the problem that we don't know the overall number of tuples
beforehand.

I'm not sure that there is any way to deal with (1) fully without
examining every single page, but algorithms that assume that the TIDs
are numbered linearly are broken before they start.

regards, tom lane

In response to

Re: Gsoc2012 idea, tablesample at 2012-05-11 13:54:35 from Florian Pflug

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2012-05-11 14:15:50	Re: Draft release notes complete
Previous Message	Kevin Grittner	2012-05-11 14:03:13	Re: Gsoc2012 idea, tablesample