| From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> | 
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> | 
| Subject: | TABLESAMPLE patch | 
| Date: | 2014-12-10 23:24:49 | 
| Message-ID: | 5488D641.2050303@2ndquadrant.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL 
standard clause and couple of people tried to submit it before so I 
think I don't need to explain in length what it does - basically returns 
"random" sample of a table using a specified sampling method.
I implemented both SYSTEM and BERNOULLI sampling as specified by SQL 
standard. The SYSTEM sampling does block level sampling using same 
algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
There is API for sampling methods which consists of 4 functions at the 
moment - init, end, nextblock and nexttuple. I added catalog which maps 
the sampling method to the functions implementing this API. The grammar 
creates new TableSampleRange struct that I added for sampling. Parser 
then uses the catalog to load information about the sampling method into 
TableSampleClause which is then attached to RTE. Planner checks for if 
this parameter is present in the RTE and if it finds it it will create 
plan with just one path - SampleScan. SampleScan implements standard 
executor API and calls the sampling method API as needed.
It is possible to write custom sampling methods. The sampling method 
parameters are not limited to just percent number as in standard but 
dynamic list of expressions which is checked against the definition of 
the init function in a similar fashion (although much simplified) as 
function calls are.
Notable lacking parts are:
- proper costing and returned row count estimation - given the dynamic 
nature of parameters I think for we'll need to let the sampling method 
do this, so there will have to be fifth function in the API.
- ruleutils support (it needs a bit of code in get_from_clause_item 
function)
- docs are sparse at the moment
-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
| Attachment | Content-Type | Size | 
|---|---|---|
| tablesample-v1.patch | text/x-diff | 87.6 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Petr Jelinek | 2014-12-10 23:29:37 | Re: TABLESAMPLE patch | 
| Previous Message | Kevin Grittner | 2014-12-10 23:23:46 | Re: Casting issues with domains |