Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Date: 2015-08-06 20:14:51
Message-ID: 55C3C03B.6080805@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/06/2015 01:10 PM, Simon Riggs wrote:
> Given, user-stated probability of accessing a block of P and N total
> blocks, there are a few ways to implement block sampling.
>
> 1. Test P for each block individually. This gives a range of possible
> results, with 0 blocks being possible outcome, though decreasing in
> probability as P increases for fixed N. This is the same way BERNOULLI
> works, we just do it for blocks rather than rows.
>
> 2. We calculate P/N at start of scan and deliver this number blocks by
> random selection from N available blocks.
>
> At present we do (1), exactly as documented. (2) is slightly harder
> since we'd need to track which blocks have been selected already so we
> can use a random selection with no replacement algorithm. On a table
> with uneven distribution of rows this would still return a variable
> sample size, so it didn't seem worth changing.

Aha, thanks!

So, seems like this is just a doc issue? That is, we just need to
document that using SYSTEM on very small sample sizes may return
unexpected numbers of results ... and maybe also how the algorithm
actually works.

I agree that implementing (2) makes more sense as an additional
algorithm for someone to write in the future.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2015-08-06 20:17:17 Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Previous Message Simon Riggs 2015-08-06 20:10:52 Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows