From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | TABLESAMPLE patch |
Date: | 2014-12-10 23:24:49 |
Message-ID: | 5488D641.2050303@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
Attached is a basic implementation of TABLESAMPLE clause. It's SQL
standard clause and couple of people tried to submit it before so I
think I don't need to explain in length what it does - basically returns
"random" sample of a table using a specified sampling method.
I implemented both SYSTEM and BERNOULLI sampling as specified by SQL
standard. The SYSTEM sampling does block level sampling using same
algorithm as ANALYZE, BERNOULLI scans whole table and picks tuple randomly.
There is API for sampling methods which consists of 4 functions at the
moment - init, end, nextblock and nexttuple. I added catalog which maps
the sampling method to the functions implementing this API. The grammar
creates new TableSampleRange struct that I added for sampling. Parser
then uses the catalog to load information about the sampling method into
TableSampleClause which is then attached to RTE. Planner checks for if
this parameter is present in the RTE and if it finds it it will create
plan with just one path - SampleScan. SampleScan implements standard
executor API and calls the sampling method API as needed.
It is possible to write custom sampling methods. The sampling method
parameters are not limited to just percent number as in standard but
dynamic list of expressions which is checked against the definition of
the init function in a similar fashion (although much simplified) as
function calls are.
Notable lacking parts are:
- proper costing and returned row count estimation - given the dynamic
nature of parameters I think for we'll need to let the sampling method
do this, so there will have to be fifth function in the API.
- ruleutils support (it needs a bit of code in get_from_clause_item
function)
- docs are sparse at the moment
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
tablesample-v1.patch | text/x-diff | 87.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2014-12-10 23:29:37 | Re: TABLESAMPLE patch |
Previous Message | Kevin Grittner | 2014-12-10 23:23:46 | Re: Casting issues with domains |