From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Date: | 2015-08-06 19:14:42 |
Message-ID: | 55C3B222.1060301@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Version: 9.5alpha2
Issue: when requesting small samples, SYSTEM often returns zero rows,
and sometimes returns unexpected numbers of rows.
Example:
create table thous ( id int, val text );
insert into thous select i, i::text || '-val' from
generate_series(1,100000) as gs(i);
analyze;
This is a 100,000 row table, so a 0.01% sample should be 10 rows. Since
10 rows is far less than what's on one data page, the documentation
suggests that I should get 1 data page worth of rows. However:
postgres=# select * from thous tablesample system ( 0.01 );
id | val
----+-----
(0 rows)
Time: 0.636 ms
... this query consistently returns 0 rows, in 20 runs.
Ok,let's try a million-row table, which is the example we have in the docs.
postgres=# select * from mil tablesample system ( 0.01 );
id | val
----+-----
(0 rows)
Hmmm?
On testing, the query against the million-row table returns 0 rows
around 50% of the time.
This table has around 185 rows per page. As the sample size goes up,
the number times I get zero rows goes down, but those results seem to
still include data pages with zero rows. For example, here's a series
of results from a 0.04 sample against the million-row table.
370
370
370
555
555
185
0
925
Since this is a synthetic table I just generated, it contains almost
exactly 185 rows per data page for every data page. So on a 0.04%
sample, the variation between 370 rows and 555 rows (whether we have 2
or 3 data pages) is expected, since 0.04% of 5406 data pages is 2.16 pages.
The results of 0, 185 and 925 are not. It really seems like SYSTEM is
treating 0.04% as a maximum, but taking a random number of data pages
somewhere around that maximum, using math which can choose numbers of
pages far outside of the % requested by the user, and which includes 0.
Speaking from a user perspective, SYSTEM seems broken to me. I can't
imagine using it for anything with a that degree of variation in the
number of results returned, especially if it's possible to return zero
rows from a populated table.
BERNOULLI works as expected.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2015-08-06 19:27:02 | Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows |
Previous Message | Robert Haas | 2015-08-06 18:46:17 | Re: Race conditions in shm_mq.c |