| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
|---|---|
| To: | Petr Jelinek <petr(at)2ndquadrant(dot)com> | 
| Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz> | 
| Subject: | Re: TABLESAMPLE patch | 
| Date: | 2015-01-19 06:08:32 | 
| Message-ID: | CAA4eK1KnQ28vssO4GiUduogefi9s7RJiSVq2azDA+aXk1uvjcw@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> On 17/01/15 13:46, Amit Kapila wrote:
>
>
>> 3.
>> @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
>> /* Foreign table */
>> set_foreign_pathlist(root, rel, rte);
>> }
>> +else if (rte->tablesample != NULL)
>> +{
>> +/* Build sample scan on relation */
>> +set_tablesample_rel_pathlist(root, rel, rte);
>> +}
>>
>> Have you consider to have a different RTE for this?
>>
>>
> I assume you mean different RTEKind, yes I did, but the fact is that it's
> still a relation, just with different scan type and I didn't want to modify
> every piece of code which deals with RTE_RELATION to also deal with the new
> RTE for sample scan as it seems unnecessary. I am not strongly opinionated
> about this one though.
>
>
No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.
I think we can once check other places where some separate
handling for (rte->inh) is done to see if we need some different handing
for Tablesample scan.
>  4.
>> SampleNext()
>> a.
>> Isn't it better to code SampleNext() similar to SeqNext() and
>> IndexNext(), which encapsulate the logic to get the tuple in
>> another function(tbs_next() or something like that)?
>>
>
> Possibly, my thinking was that unlike the index_getnext() and
> heap_getnext(), this function would not be called from any other place so
> adding one more layer of abstraction didn't seem useful. And it would mean
> creating new ScanDesc struct, etc.
>
>
I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.
>
>> b.
>> Also don't we want to handle pruning of page while
>> scanning (heap_page_prune_opt()) and I observed
>> in heap scan API's after visibility check we do check
>> for serializable conflict (CheckForSerializableConflictOut()).
>>
>
> Both good points, will add.
>
>
Another one is PageIsAllVisible() check.
>  Another thing is don't we want to do anything for sync scans
>> for these method's as they are doing heap scan?
>>
>
> I don't follow this one tbh.
>
>
Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.
>
>> c.
>> for bernoulli method, it will first get the tupoffset with
>> the help of function and then apply visibility check, it seems
>> that could reduce the sample set way lower than expected
>> in case lot of tuples are not visible, shouldn't it be done in reverse
>> way(first visibility check, then call function to see if we want to
>> include it in the sample)?
>> I think something similar is done in acquire_sample_rows which
>> seems right to me.
>>
>
> I don't think so, the way bernoulli works is that it returns every tuple
> with equal probability, so the visible tuples have same chance of being
> returned as the invisible ones so the issue should be smoothed away
> automatically (IMHO).
>
>
How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.
> The acquire_sample_rows has limit on number of rows it returns so that's
> why it has to do the visibility check before as the problem you described
> applies there.
>
>
Even if in tablesample method, the argument value is in
percentage that doesn't mean not to give any consideration to
number of rows returned.  The only difference I could see is with
tablesample method the number of rows returned will not be accurate
number.  I think it is better to consider only visible rows.
> The reason for using the probability instead of tuple limit is the fact
> that there is no way to accurately guess number of tuples in the table at
> the beginning of scan. This method should actually be better at returning
> the correct number of tuples without dependence on how many of them are
> visible or not and how much space in blocks is used.
>
>
>
>> 5.
>>
>>
>> So above test yield 60% rows first time and 100% rows next time,
>> when the test has requested 80%.
>> I understand that sample percentage will result an approximate
>> number of rows, however I just wanted that we should check if the
>> implementation has any problem or not?
>>
>
> I think this is caused by random generator not producing smooth enough
> random distribution on such a small sample (or possibly in general?).
>
>
Yeah it could be possible, I feel we should check with large
sample of rows to see if there is any major difference?
>
>> In this regard, I have one question related to below code:
>>
>> Why are we not considering tuple in above code
>> if tupoffset is less than maxoffset?
>>
>>
> We consider it, I will rename the samplesize to probability or something
> to make it more clear what it actually is and maybe expand the comment
> above the loop.
>
Makes sense.
> 6.
>
>> One larger question about the approach used in patch, why do you
>> think it is better to have a new node (SampleScan/SamplePath) like
>> you have used in patch instead of doing it as part of Sequence Scan.
>> I agree that we need some changes in different parts of Sequence Scan
>> execution, but still sounds like a viable way.  One simple thing that
>> occurs to me is that in ExecSeqScan(), we can use something like
>> SampleSeqNext instead of SeqNext to scan heap in a slightly different
>> way, probably doing it as part of sequence scan will have advantage that
>> we can use most of the existing infrastructure (sequence node path)
>> and have less discrepancies as mentioned in point-4.
>>
>>
> I originally started from SeqScan but well, it requires completely
> different State structure, it needs to call sampling methods in various
> places (not just for next tuple), it needs different handling in explain
> and in optimizer (costing). If we add all this to sequential scan it will
> not be that much different from new scan node (we'd save the couple of new
> copy functions and one struct, but I'd rather have clearly distinct scan).
>
>
I understand that point, but I think it is worth considering if
it can be done as SeqScan node especially because plan node
doesn't need to store any additional information for sample scan.
I think this point needs some more thoughts and if none of us
could come with a clean way to do it via seqscan node then we can
proceed with a separate node idea.
Another reason why I am asking to consider it is that after
spending effort on further review and making the current approach
robust, it should not happen that someone else (probably one
of the committers) should say that it can be done with sequence scan
node without much problems.
> It would also not help with the discrepancies you mentioned because those
> are in heapam and SampleScan would not use that even if it was merged with
> SeqScan - I don't think we want to implement the sampling on heapam level,
> it's too much of a hotspot to be good idea to add additional cycles there.
I have no intention in adding more cycles to heap layer, rather
try to use some of the existing API's if possible.
One another separate observation:
typedef struct SamplePath
{
Path path;
Oid tsmcost; /*
table sample method costing function */
List   *tsmargs; /* arguments to a TABLESAMPLE clause
*/
} SamplePath;
a.
Do we really need to have tsmcost and tsmargs stored in SamplePath
when we don't want to maintain them in plan (SamplePlan), patch gets
all the info via RTE in executor, so it seems to me we can do without
them.
b.
* SamplePath represents a sample sacn
typo /sacn/scan
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kyotaro HORIGUCHI | 2015-01-19 06:24:16 | Re: Async execution of postgres_fdw. | 
| Previous Message | Michael Paquier | 2015-01-19 05:50:39 | Re: pg_rewind in contrib |