Re: DELETE FROM takes forever

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: DELETE FROM takes forever
Date: 2011-02-10 18:51:14
Message-ID: 87pqqzyb8d.fsf@cbbrowne.afilias-int.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

slushie(at)gmail(dot)com (Josh) writes:
> I'm trying to do a DELETE FROM on my large table (about 800 million
> rows) based on the contents of another, moderately large table (about
> 110 million rows). The command I'm using is:
>
> DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records);
>
> This process ran for about two weeks before I decided to stop it -- it
> was dragging down the DB server. I can understand long-running
> processes, but two weeks seems a bit much even for a big table.
>
> Is this the best way to approach the problem? Is there a better way?
>
> Some background: The server is version 8.3, running nothing but Pg.
> The 'records' table has 'id' as its primary key, and one other index
> on another column. The table is referenced by just about every other
> table in my DB (about 15 other tables) via foreign key constraints,
> which I don't want to break (which is why I'm not just recreating the
> table rather than deleting rows). Most of the dependent tables have ON
> DELETE CASCADE. The 'unique_records' table is a temp table I got via
> something like: SELECT DISTINCT (other_column) id INTO unique_records
> FROM records

I'd be really inclined to do this incrementally, to trim out a few
thousand at a time, if at all possible.

You wind up firing a whole lot of foreign key constraint triggers to
verify that everything's OK, and this'll wind up being just huge.

What I might do in such a case is to construct a table that contains all
the key values that ought to get trimmed, likely:

select id into records_to_delete from records where id not in (select
id from unique_records);
create index rtd_id on records_to_delete (id);

Then loop on the following set of queries:

drop table if exists records_presently_being_deleted;
select id into temp table records_presently_being_deleted
from records_to_delete limit 5000;
delete from records where id in (select id from
records_presently_being_deleted) and
id not in (select id from unique_records);
delete from records_to_delete where id in (select id from records_presently_being_deleted);

That'll drop out 5000 records at a time, you'll have no
ultra-long-running transactions, and you'll get regular feedback that
it's doing work for you.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','linuxdatabases.info').
http://linuxfinances.info/info/linuxdistributions.html
"A ROUGH WHIMPER OF INSANITY" is an anagram for "INFORMATION
SUPERHIGHWAY".

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Josh 2011-02-10 19:15:59 Re: DELETE FROM takes forever
Previous Message Tom Lane 2011-02-10 18:44:18 Re: DELETE FROM takes forever