From: | Ants Aasma <ants(at)cybertec(dot)at> |
---|---|
To: | Qi Huang <huangqiyx(at)hotmail(dot)com>, heikki(dot)linnakangas(at)enterprisedb(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de, alvherre(at)commandprompt(dot)com, neil(dot)conway(at)gmail(dot)com, daniel(at)heroku(dot)com, cbbrowne(at)gmail(dot)com, kevin(dot)grittner(at)wicourts(dot)gov |
Subject: | Re: Gsoc2012 idea, tablesample |
Date: | 2012-04-23 17:34:44 |
Message-ID: | CA+CSw_twJkJ0P45oGNXRJ4RFfknEuuR4nVd=GqXUGw4gGm-HUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Apr 23, 2012 at 4:37 PM, Sandro Santilli <strk(at)keybit(dot)net> wrote:
> I'd love to see enhanced CTID operators, to fetch all visible tuples in a page
> using a tidscan. Something like: WHERE ctid =~ '(501,*)' or a ctidrange.
Among other things, this would enable user-space implementation of
tablesample. Given the operator =~(tid, int) that matches the page
number and planner/executor integration so that it results in a TID
scan, you would need the following functions:
random_pages(tbl regclass, samples int) returns int[]
aggregate function:
reservoir_sample(item anyelement, samples int) returns anyarray
Implementations for both of the functions could be adapted from analyze.c.
Then tablesample could be implemented with the following query:
SELECT (SELECT reservoir_sample(some_table, 50) AS samples
FROM some_table WHERE ctid =~ ANY (rnd_pgtids))
FROM random_pages('some_table', 50) AS rnd_pgtids;
Actually, now that I think about it, it could actually be implemented
without any modifications to core at some cost to efficiency.
random_pages would have to return tid[] that contains for each
generated pagenumber all possible tids on that page.
By making the building blocks available users get more flexibility.
The downside would be that we can't automatically make better sampling
methods available.
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2012-04-24 02:03:06 | Re: Patch: add timing of buffer I/O requests |
Previous Message | Noah Misch | 2012-04-23 16:30:03 | psql omits row count under "\x auto" |