From: | "William Temperley" <willtemperley(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Very large tables |
Date: | 2008-11-28 15:40:35 |
Message-ID: | 439dc11e0811280740p1d34c877w5b5fc7c443be3df2@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all
Has anyone any experience with very large tables?
I've been asked to store a grid of 1.5 million geographical locations,
fine. However, associated with each point are 288 months, and
associated with each month are 500 float values (a distribution
curve), i.e. 1,500,000 * 288 * 500 = 216 billion values :).
So a 216 billion row table is probably out of the question. I was
considering storing the 500 floats as bytea.
This means I'll need a table something like this:
grid_point_id | month_id | distribution_curve
(int4) | (int2) | (bytea?)
------------------+---------------+---------------
Any advice would be appreciated, especially on the storage of the 500 floats.
Another (somewhat far fetched) possibility was a custom data type,
which delegated it's data access to HDF5 somehow - perhaps by storing
a reference to a value location. The reason for this is that data will
be written using PyTables and HDF5. It is produced in 500 runs each
providing a value to the distribution curve for all points and months
-(500 updates of a 500 million row table...no thanks). Querying is the
opposite - we want the whole chunk of 500 values at a time. Is this a
fantasy?
Cheers
Will T
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2008-11-28 15:47:30 | Re: Getting back the autocast on non-character via CREATE CAST |
Previous Message | Stephane Bortzmeyer | 2008-11-28 15:13:25 | Re: Problem with langage encoding |