| From: | Petr Jelinek <petr(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com> | 
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows | 
| Date: | 2015-08-06 20:22:31 | 
| Message-ID: | 55C3C207.8040608@2ndquadrant.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 2015-08-06 22:17, Josh Berkus wrote:
> On 08/06/2015 01:14 PM, Josh Berkus wrote:
>> On 08/06/2015 01:10 PM, Simon Riggs wrote:
>>> Given, user-stated probability of accessing a block of P and N total
>>> blocks, there are a few ways to implement block sampling.
>>>
>>> 1. Test P for each block individually. This gives a range of possible
>>> results, with 0 blocks being possible outcome, though decreasing in
>>> probability as P increases for fixed N. This is the same way BERNOULLI
>>> works, we just do it for blocks rather than rows.
>>>
>>> 2. We calculate P/N at start of scan and deliver this number blocks by
>>> random selection from N available blocks.
>>>
>>> At present we do (1), exactly as documented. (2) is slightly harder
>>> since we'd need to track which blocks have been selected already so we
>>> can use a random selection with no replacement algorithm. On a table
>>> with uneven distribution of rows this would still return a variable
>>> sample size, so it didn't seem worth changing.
>>
>> Aha, thanks!
>>
>> So, seems like this is just a doc issue? That is, we just need to
>> document that using SYSTEM on very small sample sizes may return
>> unexpected numbers of results ... and maybe also how the algorithm
>> actually works.
>
Yes, it's expected behavior on very small sample size so doc patch seems 
best fix.
> Following up on this ... where is TABLESAMPLE documented other than in
> the SELECT command?  Doc search on the website is having issues right
> now.  I'm happy to write a doc patch.
>
The user documentation is only in SELECT page, the rest is API docs.
-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Josh Berkus | 2015-08-06 20:25:48 | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows | 
| Previous Message | Simon Riggs | 2015-08-06 20:19:03 | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |