Quick Links

Re: Optimizing Time Series Access

From:	Robert Burgholzer <rburghol(at)vt(dot)edu>
To:	Dave Duke <dave(dot)duke(at)cryptic(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Optimizing Time Series Access
Date:	2014-04-09 19:48:57
Message-ID:	CACT-NG+bKY96EA3pYMyDMJU19EAyzdZrx1Y+GQo4f1MNQT9q=A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Dave,
Thanks for asking about the structure. I can say that it appears to me to
be fairly moderately structured, and I will list those aspects that I think
make it defined (STRUCTURED), and those which are more variable
(Moderately...).

STRUCTURED:
Location - Values are keyed according to a location, and there are only
about 500 locations in my data set, so theoretically the data is able to be
structured by these locations
dataval - Always numerical data

MODERATELY STRUCTURED:
timestamp - A set of values will be sequential in time, but be on a
variable scale (15 minutes to 1 hour to 1 day are general temporal scale).
scenarioid - there may be several copies of each piece of data representing
different model "scenarios".
param_group, param_block & param_name - descriptor of a piece of data -
there may be an infinite number of these depending upon what our models are
doing, but most of them have between 3-10 parameters.

On Tue, Apr 8, 2014 at 7:30 PM, Dave Duke <dave(dot)duke(at)cryptic(dot)co(dot)uk> wrote:

> Could you be more specific about the data is random or structured in some
> way ?
>
>
>
>
> On 8 Apr 2014, at 22:20, Robert Burgholzer <rburghol(at)vt(dot)edu> wrote:
>
> I am looking for advice on dealing with large tables of environmental
> model data and looking for alternatives to my current optimization
> approaches. Basically, I have about 1 Billion records stored in a table
> which I access in groups of roughly 23 Million at a time. Which means
> that I have somewhere in the neighborhood of 400-500 sets of 23Mil points.
>
> The 23Mil that I pull at a time are keyed on 3 different columns, it's all
> indexed, and retrieval happens in say, 2-3 minutes (my hardware is so-so).
> So, my thought is to use some kind of caching and wonder if I can get
> advice - here are my thoughts on options, would love to hear others:
>
> * use cached tables for this - since my # of actual data groups is small,
> why not just retrieve them once, then keep them around in a specially named
> table (I do this with some other stuff, using a 30 day cache expiration)
> * Use some sort of stored procedure? I don't even know if such a thing
> really exists in PG and how it works.
> * Use table partitioning?
>
> Thanks,
> /r/b
>
> --
> --
> Robert W. Burgholzer
> 'Making the simple complicated is commonplace; making the complicated
> simple, awesomely simple, that's creativity.' - Charles Mingus
> Athletics: http://athleticalgorithm.wordpress.com/
> Science: http://robertwb.wordpress.com/
> Wine: http://reesvineyard.wordpress.com/
>
>

--
--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated
simple, awesomely simple, that's creativity.' - Charles Mingus
Athletics: http://athleticalgorithm.wordpress.com/
Science: http://robertwb.wordpress.com/
Wine: http://reesvineyard.wordpress.com/

In response to

Optimizing Time Series Access at 2014-04-08 21:20:19 from Robert Burgholzer

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Bruce Momjian	2014-04-09 21:21:05	Re: SSI slows down over time
Previous Message	Claudio Freire	2014-04-09 18:56:23	Interesting case of index un-usage