Re: optimizing daily data storage in Pg

From: P Kishor <punk(dot)kish(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: optimizing daily data storage in Pg
Date: 2010-07-23 05:39:16
Message-ID: AANLkTimDtr=nF3L6xwPZ3hh_DFuh0JY9naZcGWyvF2ST@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> On 7/22/2010 9:41 AM, P Kishor wrote:
>>
>> I have been struggling with this for a while now, have even gone down
>> a few paths but struck out, so I turn now to the community for ideas.
>> First, the problem: Store six daily variables for ~ 25 years for cells
>> in a grid.
>>
>>  * Number of vars = 6
>>  * Number of cells ~ 13 million
>>  * Number of days ~ 9125 (25 * 365)
>>
>> Optimize the store for two different kinds of queries:
>>
>> Query one: Retrieve the value of a single var for all or a portion of
>> the cells for a single day. This is analogous to an image where every
>> pixel is the value of a single var.
>
>>      SELECT<var>  FROM d WHERE yr = ? AND yday = ?;
>>      SELECT<var>  FROM d WHERE yr = ? AND yday = ? AND cell_id IN
>> (?,?,?...);
>
>
>>
>> Query two: Retrieve values for all the days or a duration of days for
>> a single var for a single cell. This is like grabbing a column out of
>> a table in which each row holds all the vars for a single day.
>
>>      SELECT<var>  FROM d WHERE cell_id = ?;
>>      SELECT<var>  FROM d WHERE cell_id IN (?,?,?...);
>
>
>
> First, I must admit to not reading your entire email.

I am not sure how to respond to your feedback give that you haven't
read the entire email. Nevertheless, thanks for writing...

>
> Second, Query 1 should be fast, regardless of how you layout the tables.

It is not fast. Right now I have data for about 250,000 cells loaded.
That comes to circa 92 million rows per year. Performance is pretty
sucky.

>
> Third, Query 2 will return 13M rows?  I dont think it matters how you layout
> the tables, returning 13M rows is always going to be slow.
>

Yes, I understand that. In reality I will never get 13 M rows. For
display purposes, I will probably get around 10,000 rows to 50,000
rows. When more rows are needed, it will be to feed a model, so that
can be offline (without an impatient human being waiting on the other
end).

Right now, my main problem is that I have either too many rows (~4 B
rows) in a manageable number of tables (25 tables) or manageable
number of rows (~13 M rows) in too many tables (~9000 tables).

>
> -Andy
>

--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-07-23 06:33:46 Re: optimizing daily data storage in Pg
Previous Message A. Kretschmer 2010-07-23 05:16:29 Re: Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime