Re: TABLESAMPLE usage

From: Matija Lesar <matija(dot)lesar(at)gmail(dot)com>
To: Tom Smith <tomsmith1989sk(at)gmail(dot)com>
Cc: Vik Fearing <vik(at)2ndquadrant(dot)fr>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: TABLESAMPLE usage
Date: 2016-01-25 09:44:33
Message-ID: CAPx3hmPVfnaw6kZzTSSseLqiDwiF174MMJ0vj8RdRu70FipTnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 25 January 2016 at 09:55, Tom Smith <tomsmith1989sk(at)gmail(dot)com> wrote:

> Thanks, the solution would work for fixed interval timestamp.
> But the data I am dealing with has irregular timestamp so can not be
> generated with exact steps.
>
> I would consider this a special case/method of random sampling, evenly
> distributed sampling according to the defined timestamp index.
>
> On Mon, Jan 25, 2016 at 3:48 AM, Vik Fearing <vik(at)2ndquadrant(dot)fr> wrote:
>
>> On 01/25/2016 05:09 AM, Tom Smith wrote:
>> > Hello:
>> >
>> > I have a big table with that is always appended with new data with a
>> unique
>> > sequence id (always incremented, or timestamp as unique index) each
>> row.
>> > I'd like to sample, say 100 rows out of say 1000 rows evently across all
>> > the rows,
>> > so that it would return rows of1, 101, 201, 301 you get idea.
>> > can TABLESAMPLE get one row for every 100 rows, based on the order
>> > of the rows added to table using the timestamp as already indexed/sorted
>> > sequence
>>
>> No, TABLESAMPLE is intended to take a random sampling of the data using
>> various methods.
>>
>> You're looking for something more like this:
>>
>> select t.*
>> from generate_series(1, (select max(id) from t), 100) g
>> join t on t.id = g;
>> --
>> Vik Fearing +33 6 46 75 15 36
>> http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
>>
>
>

Hi,

you can accomplish this with row_number()
<http://www.postgresql.org/docs/8.4/static/functions-window.html#FUNCTIONS-WINDOW-TABLE>
:

WITH data_cte as (
SELECT
id,
clock_timestamp() as ctimestamp
FROM generate_series(1,1000) as id
)
SELECT
*
FROM
(SELECT
id,
ctimestamp,
row_number() OVER (ORDER BY ctimestamp) as rownum
FROM data_cte
) as data_withrownumbers
WHERE
rownum%100=1;

Bye,
Matija Lesar

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Brian Cardarella 2016-01-25 10:16:23 Possible to dump/load a database from within psql?
Previous Message David Rowley 2016-01-25 09:44:06 Re: Performance options for CPU bound multi-SUM query