From: | Randall Skelton <skelton(at)brutus(dot)uwaterloo(dot)ca> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Moving from narrow to wide tables |
Date: | 2004-04-07 22:28:58 |
Message-ID: | F6726E1E-88E2-11D8-BEF7-000393C92230@brutus.uwaterloo.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi all,
I have begun the slow process of shuffling data from about 125 narrow
tables into a single wide table and I am seeking some insight on the
'fastest way.' The narrow tables are all of the following
configuration:
===
Table "cal_0800_time"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |
Indexes: cal_0805_time__timestamp
Table "cal_5v_dig_rl"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |
Indexes: cal_5v_dig_rl__timestamp
===
As it stands, the timestamps are not unique but they should be in the
new table. I envision something like:
===
CREATE TABLE app_id_800 (
timestamp timestamp with time zone PRIMARY KEY
CHECK (timestamp BETWEEN '2003-08-13 02:10:00 +0' AND now()),
cal_5v_dig_rl float,
...
);
===
Followed by:
===
newtelemetry=> EXPLAIN ANALYZE INSERT INTO app_id_800(timestamp) SELECT
DISTINCT timestamp FROM cal_0800_time WHERE timestamp BETWEEN
'2004-02-21 0:00:00 +0' AND '2004-02-21 12:00:00 +0';NOTICE: QUERY
PLAN:
Subquery Scan *SELECT* (cost=0.00..11542.36 rows=1134 width=8) (actual
time=0.50..1786.02 rows=36219 loops=1)
-> Unique (cost=0.00..11542.36 rows=1134 width=8) (actual
time=0.47..907.77 rows=36219 loops=1)
-> Index Scan using cal_0800_time__timestamp on cal_0800_time
(cost=0.00..11514.01 rows=11341 width=8) (actual time=0.46..812.19
rows=37920 loops=1)
Total runtime: 23162.90 msec
EXPLAIN
newtelemetry=> EXPLAIN ANALYZE UPDATE app_id_800 SET cal_ccd_temp =
cal_ccd_temp.value FROM cal_ccd_temp WHERE app_id_800.timestamp BETWEEN
'2004-02-21 00:00:00 +0' AND '2004-02-21 12:00:00 +0' AND
app_id_800.timestamp = cal_ccd_temp.timestamp;
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..6.89 rows=1 width=538) (actual
time=1.34..5215.49 rows=37920 loops=1)
-> Index Scan using app_id_800_pkey on app_id_800 (cost=0.00..3.02
rows=1 width=522) (actual time=0.82..1727.18 rows=36219 loops=1)
-> Index Scan using cal_ccd_temp__timestamp on cal_ccd_temp
(cost=0.00..3.86 rows=1 width=16) (actual time=0.04..0.05 rows=1
loops=36219)
Total runtime: 33979.31 msec
EXPLAIN
... 125 more UPDATE app_id_800 SET commands ...
===
The trouble is that this is taking a very long time when the time
interval increases. The total time for the one insert and 125 updates
(as above) is about 2-4 hrs for 1 day of data (~80K rows).
===
newtelemetry=> EXPLAIN ANALYZE INSERT INTO app_id_800(timestamp) SELECT
DISTINCT timestamp FROM cal_0800_time WHERE timestamp BETWEEN
'2004-02-21 0:00:00 +0' AND '2004-02-22 00:00:00 +0';
NOTICE: QUERY PLAN:
Subquery Scan *SELECT* (cost=0.00..40791.96 rows=4013 width=8) (actual
time=0.89..4397.78 rows=72448 loops=1)
-> Unique (cost=0.00..40791.96 rows=4013 width=8) (actual
time=0.85..2614.95 rows=72448 loops=1)
-> Index Scan using cal_0800_time__timestamp on cal_0800_time
(cost=0.00..40691.63 rows=40130 width=8) (actual time=0.85..2399.50
rows=101072 loops=1)
Total runtime: 55945.59 msec
EXPLAIN
newtelemetry=> EXPLAIN ANALYZE UPDATE app_id_800 SET cal_ccd_temp =
cal_ccd_temp.value FROM cal_ccd_temp WHERE app_id_800.timestamp BETWEEN
'2004-02-21 00:00:00 +0' AND '2004-02-22 00:00:00 +0' AND
app_id_800.timestamp = cal_ccd_temp.timestamp;
NOTICE: QUERY PLAN:
Nested Loop (cost=0.00..6.89 rows=1 width=538) (actual
time=1.08..13235.47 rows=101072 loops=1)
-> Index Scan using app_id_800_pkey on app_id_800 (cost=0.00..3.02
rows=1 width=522) (actual time=0.55..3647.76 rows=72448 loops=1)
-> Index Scan using cal_ccd_temp__timestamp on cal_ccd_temp
(cost=0.00..3.86 rows=1 width=16) (actual time=0.05..0.07 rows=1
loops=72448)
Total runtime: 68472.13 msec
EXPLAIN
===
Any ideas on wow can I speed this along? I have 4 months of data an
counting :(
Cheers,
Randall
From | Date | Subject | |
---|---|---|---|
Next Message | Sven Eschenberg | 2004-04-07 22:32:13 | Error Message from pgsql v7.3.6 |
Previous Message | Guy Rouillier | 2004-04-07 22:00:37 | Re: Can we have time based triggers in Postgresql?? |