Quick Links

Atomic access to large arrays

From:	"Victor de Buen (Bayes)" <vdebuen(at)bayesinf(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Atomic access to large arrays
Date:	2009-07-21 23:43:35
Message-ID:	f432ce640907211643u24bd116chcd36b7e608b900c@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I'm storing historical meteorological gridded data from GFS (
http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in a
table like this:

CREATE TABLE grid_f_data_i2 (
//Specifies the variable and other features of data
id_inventory integer REFERENCES grid_d_inventory(id_inventory),
//A new grid is available each 3 hours since 5 years ago
dh_date timestamp,
//Values are scaled to be stored as signed integers of 2 bytes
vl_grid smallint[361][720],
CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
(co_inventory, dh_date)
);

Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
vl_grid[y][x] stores the value at dh_date of a meteorological variable
specified by id_inventory in the geodesic point

latitude = -90 + y*0.5
longitude = x*0.5

The reverse formula for the closest point in the grid of an arbitary
geodesic point will be

y = Round((latitude+90) * 2
x = Round(longitude*2)

Field vl_grid is stored in the TOAST table and has a good compression level.
PostgreSql is the only one database that is able to store this huge amount
of data in only 34 GB of disk. It's really great system. Queries returning
big rectangular areas are very fast, but the target of almost all queries is
to get historical series for a geodesic point

SELECT dh_date, vl_grid[123][152]
FROM grid_f_data_i2
WHERE id_inventory = 6
ORDER BY dh_date

In this case, atomic access to just a cell of each one of a only few
thousands of rows becomes too slow.

Please, could somebody answer some of these questions?

- It's posible to tune some TOAST parameters to get faster atomic access
to large arrays?

- Using "EXTERNAL" strategy for storing TOAST-able columns could solve
the problem?

- Atomic access will be faster if table stores vectors for data in the
same parallel instead of matrices of global data?
CREATE TABLE grid_f_data_i2 (
//Specifies the variable and other features of data
id_inventory integer REFERENCES grid_d_inventory(id_inventory),
//A new grid is available each 3 hours since 5 years ago
dh_date timestamp,
// nu_parallel = y = Round((latitude+90) * 2
smallint nu_parallel,
//Values are scaled to be stored as signed integers of 2 bytes
vl_parallel smallint[],
CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
(co_inventory, nu_parallel, dh_date)
);

- There is another faster solution?

Thanks in advance and best regards

--
Víctor de Buen Remiro
Tol Development Team member
www.tol-project.org

Responses

Re: Atomic access to large arrays at 2009-07-22 05:11:15 from Oleg Bartunov
Re: Atomic access to large arrays at 2009-07-22 08:12:42 from Victor de Buen
Re: Atomic access to large arrays at 2009-07-22 13:54:37 from Tom Lane
Re: Atomic access to large arrays at 2009-07-22 17:59:44 from Merlin Moncure

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Kelvin Quee	2009-07-22 03:47:43	Master/Slave, DB separation or just spend $$$?
Previous Message	Oleg Bartunov	2009-07-21 19:27:28	Re: Full text search with ORDER BY performance issue