Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Date: 2015-08-06 19:57:04
Message-ID: 55C3BC10.2080604@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/06/2015 12:45 PM, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On 6 August 2015 at 20:14, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Speaking from a user perspective, SYSTEM seems broken to me. I can't
>>> imagine using it for anything with a that degree of variation in the
>>> number of results returned, especially if it's possible to return zero
>>> rows from a populated table.
>
>> Please bear in mind you have requested a very small random sample of blocks.
>
> Indeed. My expectation about it is that you'd get the requested number of
> rows *on average* over many tries (which is pretty much what Josh's
> results show). Since what SYSTEM actually returns must be a multiple of
> the number of rows per page, if you make a request that's less than that
> number of rows, you must get zero rows some of the time. Otherwise the
> sampling logic is cheating.

Except that I'm getting zero results when requesting rows which equate
to 2-3 pages, not just less than one page. Please do read the full bug
report.

And why should the *number* of pages sampled be random? That's not what
the documentation says that SYSTEM does. But it's pretty clearly what's
happening; this table has exactly 185 rows per page on every page. So
the only reason for there to be a variation in the number of rows
returned is that SYSTEM is varying the number of pages sampled.

On testing, the zero results don't go away until the number of pages
requested goes up to 5 (0.095). So that suggests that SYSTEM is varying
the number of pages retrieved by +/- 4. Why? What purpose does that serve?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-08-06 20:08:16 Re: 9.5 release notes
Previous Message Tom Lane 2015-08-06 19:45:02 Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows