Re: Deleting Rows From Large Tables

From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Rob Emery <re-pgsql(at)codeweavers(dot)net>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Deleting Rows From Large Tables
Date: 2013-05-19 21:14:07
Message-ID: CAEtnbpUyEs10PuBng+h3jEyhHo+ff4a70mP7mSf-4prhSmAHDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Rob,

I'm going to make half of the list cringe at this suggestion though I have
used it successfully.

If you can guarantee the table will not be vacuumed during this cleanup or
rows you want deleted updated, I would suggest using the ctid column to
facilitate the delete. Using the simple transaction below, I have
witnessed a DELETE move much more quickly than one using a PK or any other
column with an index.

BEGIN;
SELECT ctid INTO TEMP TABLE ctids_to_be deleted FROM my_big_table WHERE *delete
criteria*;
DELETE FROM my_big_table bt USING ctids_to_be_deleted dels WHERE bt.ctid =
dels.ctid;
COMMIT;

HTH.
-Greg

On Fri, May 17, 2013 at 5:26 AM, Rob Emery <re-pgsql(at)codeweavers(dot)net> wrote:

> Hi All,
>
> We've got 3 quite large tables that due to an unexpected surge in
> usage (!) have grown to about 10GB each, with 72, 32 and 31 million
> rows in. I've been tasked with cleaning out about half of them, the
> problem I've got is that even deleting the first 1,000,000 rows seems
> to take an unreasonable amount of time. Unfortunately this is on quite
> an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres
> 8.4; which serves other things like our logging systems.
>
> If I run a sustained (more than about 5 minutes) delete it'll have a
> detrimental effect on the other services. I'm trying to batch up the
> deletes into small chunks of approximately 1 month of data ; even this
> seems to take too long, I originally reduced this down to a single
> day's data and had the same problem. I can keep decreasing the size of
> the window I'm deleting but I feel I must be doing something either
> fundamentally wrong or over-complicating this enormously. I've
> switched over to retrieving a list of IDs to delete, storing them in
> temporary tables and deleting based on the primary keys on each of the
> tables with something similar to this:
>
> BEGIN TRANSACTION;
>
> CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT);
> CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT);
>
> INSERT INTO table_a_ids_to_delete
> SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at
> < '2007-01-01T00:00:00';
>
> INSERT INTO table_b_ids_to_delete
> SELECT table_b_id FROM table_a_table_b_xref
> INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id =
> table_a_table_b.quote_id);
>
> DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete
> WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id;
>
> DELETE FROM table_b USING table_b_ids_to_delete
> WHERE table_b.id = table_b_ids_to_delete.id;
>
> DELETE FROM table_a USING table_a_ids_to_delete
> WHERE table_a.id = table_a_ids_to_delete.id;
>
> COMMIT;
>
> There're indices on table_a on the queried columns, table_b's primary
> key is it's id, and table_a_table_b_xref has an index on (table_a_id,
> table_b_id). There're FK defined on the xref table, hence why I'm
> deleting from it first.
>
> Does anyone have any ideas as to what I can do to make the deletes any
> faster? I'm running out of ideas!
>
> Thanks in advance,
>
> --
> Rob Emery
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2013-05-19 22:15:23 Re: Reliability with RAID 10 SSD and Streaming Replication
Previous Message Jeff Janes 2013-05-19 20:36:50 Re: Deleting Rows From Large Tables