Quick Links

Re: tablesample performance

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andy Colson <andy(at)squeakycode(dot)net>, Francisco Olarte <folarte(at)peoplecall(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: tablesample performance
Date:	2016-10-18 19:53:56
Message-ID:	CANP8+j+QpasvS0NSm9o72+i9nZb87pJRYxX=s372JkOhPQFiHA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 18 October 2016 at 19:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andy Colson <andy(at)squeakycode(dot)net> writes:
>> On 10/18/2016 11:44 AM, Francisco Olarte wrote:
>>> This should be faster, but to me it seems it does a different thing.
>
>> Ah, yes, you're right, there is a bit of a difference there.
>
> If you don't want to have an implicit bias towards earlier blocks,
> I don't think that either standard tablesample method is really what
> you want.
>
> The contrib/tsm_system_rows tablesample method is a lot closer, in
> that it will start at a randomly chosen block, but if you just do
> "tablesample system_rows(1)" then you will always get the first row
> in whichever block it lands on, so it's still not exactly unbiased.

Is there a reason why we can't fix the behaviours of the three methods
mentioned above by making them all start at a random block and a
random item between min and max?

It wasn't ever intended to be biased and bernoulli in particular ought
to have a strict no bias.

Happy to patch if we agree.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: tablesample performance at 2016-10-18 17:34:32 from Tom Lane

Responses

Re: tablesample performance at 2016-10-18 20:06:20 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2016-10-18 20:06:20	Re: tablesample performance
Previous Message	Devrim Gündüz	2016-10-18 19:53:32	Re: Problem changing default data_directory in PG 9.6 + CentOS6