Quick Links

make bulk deletes faster?

From:	James Klo <jklo(at)arkitec(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	make bulk deletes faster?
Date:	2005-12-18 05:10:40
Message-ID:	jklo-C7336F.21104017122005@news.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I have the following table:

CREATE TABLE timeblock
(
timeblockid int8 NOT NULL,
starttime timestamp,
endtime timestamp,
duration int4,
blocktypeid int8,
domain_id int8,
create_date timestamp,
revision_date timestamp,
scheduleid int8,
CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid),
CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid)
REFERENCES blocktype (blocktypeid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id)
REFERENCES wa_common_domain (domain_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH OIDS;

CREATE INDEX timeblock_blocktype_idx
ON timeblock
USING btree
(blocktypeid);

CREATE INDEX timeblock_date_idx
ON timeblock
USING btree
(starttime, endtime);

CREATE INDEX timeblockepoch_idx
ON timeblock
USING btree
(date_trunc('minute'::text, starttime), (date_part('epoch'::text,
date_trunc('minute'::text, starttime)) * 1000::double precision),
date_trunc('minute'::text, endtime), (date_part('epoch'::text,
date_trunc('minute'::text, endtime)) * 1000::double precision));

CREATE INDEX timeblockhourmin_idx
ON timeblock
USING btree
(date_part('hour'::text, starttime), date_part('minute'::text,
starttime), date_part('hour'::text, endtime), date_part('minute'::text,
endtime));

CREATE INDEX timeblockid_idx
ON timeblock
USING btree
(timeblockid);

There are also indexes on wa_common_domain and blocktype on pkeys.

explain analyze delete from timeblock where timeblockid = 666666

Index Scan using timeblockid_idx on timeblock (cost=0.00..5.28 rows=1
width=6) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (timeblockid = 666666)
Total runtime: 0.069 ms

I need to routinely move data from the timeblock table to an archive
table with the same schema named timeblock_archive. I really need this
to happen as quickly as possible, as the archive operation appears to
really tax the db server...

I'd like some suggestions on how to get the deletes to happen faster, as
while deleting individually appears to extremely fast, when I go to
delete lots of rows the operation takes an extremely long time to
complete (5000 rows takes about 3 minutes, 1000000 rows takes almost
close to 4 hours or more depending upon server load; wall time btw).

i've tried several different approaches doing the delete and I can't
seem to make it much faster... anyone have any ideas?

The approaches I've taken both use a temp table to define the set that
needs to be deleted.

Here's what I've tried:

Attempt 1:
----------
delete from timeblock where timeblockid in (select timeblockid from
timeblock_tmp)

Attempt 2:
----------
num_to_delete := (select count(1) from tmp_timeblock);
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
cur_offset := 0;
while cur_offset < num_to_delete loop
delete from timeblock where timeblockid in
(select timeblockid from
tmp_timeblock limit 100 offset cur_offset);
get diagnostics num_affected = ROW_COUNT;
RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows]
cur_offset = %', timestart, num_affected, cur_offset;
cur_offset := cur_offset + 100;
end loop;

Attempt 3:
----------
num_to_delete := (select count(1) from tmp_timeblock);
cur_offset := num_to_delete;
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
open del_cursor for select timeblockid from tmp_timeblock;
loop
fetch del_cursor into del_pkey;
if not found then
exit;
else
delete from timeblock where timeblockid = del_pkey;
get diagnostics num_affected = ROW_COUNT;
cur_offset := cur_offset - num_affected;
if cur_offset % 1000 = 0 then
RAISE DEBUG 'archiveDailyData(%): delete from timeblock [%
left]', timestart, cur_offset;
end if;
end if;
end loop;
close del_cursor;

I've considered using min(starttime) and max(starttime) from the temp
table and doing:

delete from timeblock where starttime between min and max;

however, I'm concerned about leaving orphan data, deleting too much data
running into foreign key conflicts, etc.

dropping the indexes on timeblock could be bad, as this table recieves
has a high volume on reads, inserts & updates.

Any one have any suggestions?

Thanks,

Jim K

Responses

Re: make bulk deletes faster? at 2005-12-19 02:36:16 from Michael Fuhr
Re: make bulk deletes faster? at 2005-12-19 10:39:31 from Mitch Skinner
Re: make bulk deletes faster? at 2005-12-19 14:47:27 from Ang Chin Han

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Greg Stark	2005-12-18 06:50:33	Re: Should Oracle outperform PostgreSQL on a complex
Previous Message	Mark Kirkwood	2005-12-18 04:07:41	Re: Should Oracle outperform PostgreSQL on a complex