From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Date: | 2015-08-06 20:19:03 |
Message-ID: | CANP8+j+hcyYguG4GfQRGMWjSN-RsLGyZsE1AJAZoNZTiGCx80w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 6 August 2015 at 21:14, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 08/06/2015 01:10 PM, Simon Riggs wrote:
> > Given, user-stated probability of accessing a block of P and N total
> > blocks, there are a few ways to implement block sampling.
> >
> > 1. Test P for each block individually. This gives a range of possible
> > results, with 0 blocks being possible outcome, though decreasing in
> > probability as P increases for fixed N. This is the same way BERNOULLI
> > works, we just do it for blocks rather than rows.
> >
> > 2. We calculate P/N at start of scan and deliver this number blocks by
> > random selection from N available blocks.
>
(My mistake, that would be P*N)
> > At present we do (1), exactly as documented. (2) is slightly harder
> > since we'd need to track which blocks have been selected already so we
> > can use a random selection with no replacement algorithm. On a table
> > with uneven distribution of rows this would still return a variable
> > sample size, so it didn't seem worth changing.
>
> Aha, thanks!
>
> So, seems like this is just a doc issue? That is, we just need to
> document that using SYSTEM on very small sample sizes may return
> unexpected numbers of results ... and maybe also how the algorithm
> actually works.
>
For me, the docs seem exactly correct. The mathematical implications of
that just aren't recorded explicitly.
I will try to reword or add something to make it clear that this can return
a variable number of blocks and thus produces a result with greater
variability in the number of rows returned.
It's documented on the SELECT page only; plus there is a whole new section
on writing tablesample functions.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2015-08-06 20:22:31 | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Previous Message | Josh Berkus | 2015-08-06 20:17:17 | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |