From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>, Francisco Olarte <folarte(at)peoplecall(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: tablesample performance |
Date: | 2016-10-18 20:41:00 |
Message-ID: | CANP8+jJ=Mct7a6jD5iXLO2rTBrKeu+0dBEX5u_kTZ7NGKLRCyg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 18 October 2016 at 22:06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> On 18 October 2016 at 19:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If you don't want to have an implicit bias towards earlier blocks,
>>> I don't think that either standard tablesample method is really what
>>> you want.
>>>
>>> The contrib/tsm_system_rows tablesample method is a lot closer, in
>>> that it will start at a randomly chosen block, but if you just do
>>> "tablesample system_rows(1)" then you will always get the first row
>>> in whichever block it lands on, so it's still not exactly unbiased.
>
>> Is there a reason why we can't fix the behaviours of the three methods
>> mentioned above by making them all start at a random block and a
>> random item between min and max?
>
> The standard tablesample methods are constrained by other requirements,
> such as repeatability. I am not sure that loading this one on top of
> that is a good idea. The bias I referred to above is *not* the fault
> of the sample methods, rather it's the fault of using "LIMIT 1".
Hmm, yeh, that would make it a little too much of a special case.
> It does seem like maybe it'd be nice for tsm_system_rows to start at a
> randomly chosen entry in the first block it visits, rather than always
> dumping that entire block. Then "tablesample system_rows(1)" would
> actually give you a pretty random row, and I think we aren't giving up
> any useful properties it has now.
OK, will patch that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Kellerer | 2016-10-18 21:01:33 | Re: Getting the currently used sequence for a SERIAL column |
Previous Message | Tom Lane | 2016-10-18 20:06:20 | Re: tablesample performance |