From: | P Kishor <punk(dot)kish(at)gmail(dot)com> |
---|---|
To: | Andy Colson <andy(at)squeakycode(dot)net> |
Cc: | PostgreSQL <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: optimizing daily data storage in Pg |
Date: | 2010-07-23 05:39:16 |
Message-ID: | AANLkTimDtr=nF3L6xwPZ3hh_DFuh0JY9naZcGWyvF2ST@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Jul 22, 2010 at 4:56 PM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> On 7/22/2010 9:41 AM, P Kishor wrote:
>>
>> I have been struggling with this for a while now, have even gone down
>> a few paths but struck out, so I turn now to the community for ideas.
>> First, the problem: Store six daily variables for ~ 25 years for cells
>> in a grid.
>>
>> * Number of vars = 6
>> * Number of cells ~ 13 million
>> * Number of days ~ 9125 (25 * 365)
>>
>> Optimize the store for two different kinds of queries:
>>
>> Query one: Retrieve the value of a single var for all or a portion of
>> the cells for a single day. This is analogous to an image where every
>> pixel is the value of a single var.
>
>> SELECT<var> FROM d WHERE yr = ? AND yday = ?;
>> SELECT<var> FROM d WHERE yr = ? AND yday = ? AND cell_id IN
>> (?,?,?...);
>
>
>>
>> Query two: Retrieve values for all the days or a duration of days for
>> a single var for a single cell. This is like grabbing a column out of
>> a table in which each row holds all the vars for a single day.
>
>> SELECT<var> FROM d WHERE cell_id = ?;
>> SELECT<var> FROM d WHERE cell_id IN (?,?,?...);
>
>
>
> First, I must admit to not reading your entire email.
I am not sure how to respond to your feedback give that you haven't
read the entire email. Nevertheless, thanks for writing...
>
> Second, Query 1 should be fast, regardless of how you layout the tables.
It is not fast. Right now I have data for about 250,000 cells loaded.
That comes to circa 92 million rows per year. Performance is pretty
sucky.
>
> Third, Query 2 will return 13M rows? I dont think it matters how you layout
> the tables, returning 13M rows is always going to be slow.
>
Yes, I understand that. In reality I will never get 13 M rows. For
display purposes, I will probably get around 10,000 rows to 50,000
rows. When more rows are needed, it will be to feed a model, so that
can be offline (without an impatient human being waiting on the other
end).
Right now, my main problem is that I have either too many rows (~4 B
rows) in a manageable number of tables (25 tables) or manageable
number of rows (~13 M rows) in too many tables (~9000 tables).
>
> -Andy
>
--
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================
From | Date | Subject | |
---|---|---|---|
Next Message | Scott Marlowe | 2010-07-23 06:33:46 | Re: optimizing daily data storage in Pg |
Previous Message | A. Kretschmer | 2010-07-23 05:16:29 | Re: Difference between EXPLAIN ANALYZE SELECT ... total runtime and SELECT ... runtime |