Quick Links

Re: tablesample performance

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andy Colson <andy(at)squeakycode(dot)net>
Cc:	Francisco Olarte <folarte(at)peoplecall(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: tablesample performance
Date:	2016-10-18 17:34:32
Message-ID:	24207.1476812072@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Andy Colson <andy(at)squeakycode(dot)net> writes:
> On 10/18/2016 11:44 AM, Francisco Olarte wrote:
>> This should be faster, but to me it seems it does a different thing.

> Ah, yes, you're right, there is a bit of a difference there.

If you don't want to have an implicit bias towards earlier blocks,
I don't think that either standard tablesample method is really what
you want.

The contrib/tsm_system_rows tablesample method is a lot closer, in
that it will start at a randomly chosen block, but if you just do
"tablesample system_rows(1)" then you will always get the first row
in whichever block it lands on, so it's still not exactly unbiased.
Maybe you could select "tablesample system_rows(100)" or so and then
do the order-by-random trick on that sample. This would be a lot
faster than selecting 100 random rows with either built-in sample
method, since the rows it grabs will be consecutive.

regards, tom lane

In response to

Re: tablesample performance at 2016-10-18 17:17:01 from Andy Colson

Responses

Re: tablesample performance at 2016-10-18 19:53:56 from Simon Riggs

Browse pgsql-general by date

	From	Date	Subject
Next Message	Francisco Olarte	2016-10-18 17:53:50	Re: tablesample performance
Previous Message	Andy Colson	2016-10-18 17:17:01	Re: tablesample performance