From: | James Klo <jklo(at)arkitec(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | make bulk deletes faster? |
Date: | 2005-12-18 05:10:40 |
Message-ID: | jklo-C7336F.21104017122005@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I have the following table:
CREATE TABLE timeblock
(
timeblockid int8 NOT NULL,
starttime timestamp,
endtime timestamp,
duration int4,
blocktypeid int8,
domain_id int8,
create_date timestamp,
revision_date timestamp,
scheduleid int8,
CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid),
CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid)
REFERENCES blocktype (blocktypeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id)
REFERENCES wa_common_domain (domain_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH OIDS;
CREATE INDEX timeblock_blocktype_idx
ON timeblock
USING btree
(blocktypeid);
CREATE INDEX timeblock_date_idx
ON timeblock
USING btree
(starttime, endtime);
CREATE INDEX timeblockepoch_idx
ON timeblock
USING btree
(date_trunc('minute'::text, starttime), (date_part('epoch'::text,
date_trunc('minute'::text, starttime)) * 1000::double precision),
date_trunc('minute'::text, endtime), (date_part('epoch'::text,
date_trunc('minute'::text, endtime)) * 1000::double precision));
CREATE INDEX timeblockhourmin_idx
ON timeblock
USING btree
(date_part('hour'::text, starttime), date_part('minute'::text,
starttime), date_part('hour'::text, endtime), date_part('minute'::text,
endtime));
CREATE INDEX timeblockid_idx
ON timeblock
USING btree
(timeblockid);
There are also indexes on wa_common_domain and blocktype on pkeys.
explain analyze delete from timeblock where timeblockid = 666666
Index Scan using timeblockid_idx on timeblock (cost=0.00..5.28 rows=1
width=6) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (timeblockid = 666666)
Total runtime: 0.069 ms
I need to routinely move data from the timeblock table to an archive
table with the same schema named timeblock_archive. I really need this
to happen as quickly as possible, as the archive operation appears to
really tax the db server...
I'd like some suggestions on how to get the deletes to happen faster, as
while deleting individually appears to extremely fast, when I go to
delete lots of rows the operation takes an extremely long time to
complete (5000 rows takes about 3 minutes, 1000000 rows takes almost
close to 4 hours or more depending upon server load; wall time btw).
i've tried several different approaches doing the delete and I can't
seem to make it much faster... anyone have any ideas?
The approaches I've taken both use a temp table to define the set that
needs to be deleted.
Here's what I've tried:
Attempt 1:
----------
delete from timeblock where timeblockid in (select timeblockid from
timeblock_tmp)
Attempt 2:
----------
num_to_delete := (select count(1) from tmp_timeblock);
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
cur_offset := 0;
while cur_offset < num_to_delete loop
delete from timeblock where timeblockid in
(select timeblockid from
tmp_timeblock limit 100 offset cur_offset);
get diagnostics num_affected = ROW_COUNT;
RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows]
cur_offset = %', timestart, num_affected, cur_offset;
cur_offset := cur_offset + 100;
end loop;
Attempt 3:
----------
num_to_delete := (select count(1) from tmp_timeblock);
cur_offset := num_to_delete;
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
open del_cursor for select timeblockid from tmp_timeblock;
loop
fetch del_cursor into del_pkey;
if not found then
exit;
else
delete from timeblock where timeblockid = del_pkey;
get diagnostics num_affected = ROW_COUNT;
cur_offset := cur_offset - num_affected;
if cur_offset % 1000 = 0 then
RAISE DEBUG 'archiveDailyData(%): delete from timeblock [%
left]', timestart, cur_offset;
end if;
end if;
end loop;
close del_cursor;
I've considered using min(starttime) and max(starttime) from the temp
table and doing:
delete from timeblock where starttime between min and max;
however, I'm concerned about leaving orphan data, deleting too much data
running into foreign key conflicts, etc.
dropping the indexes on timeblock could be bad, as this table recieves
has a high volume on reads, inserts & updates.
Any one have any suggestions?
Thanks,
Jim K
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2005-12-18 06:50:33 | Re: Should Oracle outperform PostgreSQL on a complex |
Previous Message | Mark Kirkwood | 2005-12-18 04:07:41 | Re: Should Oracle outperform PostgreSQL on a complex |