Re: tablesample performance

From: Francisco Olarte <folarte(at)peoplecall(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: tablesample performance
Date: 2016-10-18 17:53:50
Message-ID: CA+bJJbxfQym-n8N5w5N56GF6QCmB2SoT2ySnysPFAad-D2EBcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Andy:

On Tue, Oct 18, 2016 at 7:17 PM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> Ah, yes, you're right, there is a bit of a difference there.
>
> Speed wise:
> 1) select one from ones order by random() limit 1;
>> about 360ms
> 2) select one from ones tablesample bernoulli(1) limit 1 ;
>> about 4ms
> 3) select one from ones tablesample bernoulli(1) order by random() limit 1;
>> about 80ms

Expected. It would be nice if you had provided some tbale structure / size data.
>
> Using the third option in batch, I'm getting about 15 transactions a second.
>
> Oddly:
> select one from ones tablesample bernoulli(0.25) order by random()
> takes almost 80ms also.

mmm, it depends a lot on you total rows and average rows per

> bernoulli(0.25) returns 3k rows
> bernoulli(1) returns 14k rows

This hints at 1M4 rows (14k / 1%). If your rows are small and you have
more than 400 rows per page I would expect that, as .25% sample would
hit every page.

Tome hinted you at an extension. Also, if you are in a function (
which can loop ) you can do a little trick, instead of bernouilli(1)
use bernouilli (N/table_size). This way you will select very few rows
and speed up the last phase. Anyway, I fear bernouilly must read all
the table too, to be able to discard randomly, so you may not win
nothing ( I would compare the query time against a simple 'count(one)
query', to have a benchmark of how much time the server expends
reading the table. I would bet for 'about 80 ms'.

Francisco Olarte.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jacob Scott 2016-10-18 18:12:07 Postgresql apt repository naming scheme question
Previous Message Tom Lane 2016-10-18 17:34:32 Re: tablesample performance