Re: Deleting Rows From Large Tables

From: Sékine Coulibaly <scoulibaly(at)gmail(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Deleting Rows From Large Tables
Date: 2013-05-17 14:26:00
Message-ID: CAD8n-Fq4t2QwcNFh-A5mXKrfpr6Hh-stFw-5PZ8QPYCWzUKF8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Oh, sorry, overlooked that part.
Maybe refreshing stats with VACUUM FULL ?

2013/5/17 Robert Emery <robertemery(at)codeweavers(dot)net>

> Hi Sékine,
>
> Unfortunately I'm not trying to empty the table completely, just
> delete about 10-15% of the data in it.
>
> Thanks,
>
> On 17 May 2013 14:11, Sékine Coulibaly <scoulibaly(at)gmail(dot)com> wrote:
> > Rob,
> >
> > Did you tried TRUNCATE ?
> > http://www.postgresql.org/docs/8.4/static/sql-truncate.html
> >
> > This is is supposed to be quicker since it does scan the table.
> >
> > Regards
> >
> >
> > 2013/5/17 Rob Emery <re-pgsql(at)codeweavers(dot)net>
> >>
> >> Hi All,
> >>
> >> We've got 3 quite large tables that due to an unexpected surge in
> >> usage (!) have grown to about 10GB each, with 72, 32 and 31 million
> >> rows in. I've been tasked with cleaning out about half of them, the
> >> problem I've got is that even deleting the first 1,000,000 rows seems
> >> to take an unreasonable amount of time. Unfortunately this is on quite
> >> an old server (Dell 2950 with a RAID-10 over 6 disks) running Postgres
> >> 8.4; which serves other things like our logging systems.
> >>
> >> If I run a sustained (more than about 5 minutes) delete it'll have a
> >> detrimental effect on the other services. I'm trying to batch up the
> >> deletes into small chunks of approximately 1 month of data ; even this
> >> seems to take too long, I originally reduced this down to a single
> >> day's data and had the same problem. I can keep decreasing the size of
> >> the window I'm deleting but I feel I must be doing something either
> >> fundamentally wrong or over-complicating this enormously. I've
> >> switched over to retrieving a list of IDs to delete, storing them in
> >> temporary tables and deleting based on the primary keys on each of the
> >> tables with something similar to this:
> >>
> >> BEGIN TRANSACTION;
> >>
> >> CREATE TEMPORARY TABLE table_a_ids_to_delete (id INT);
> >> CREATE TEMPORARY TABLE table_b_ids_to_delete (id INT);
> >>
> >> INSERT INTO table_a_ids_to_delete
> >> SELECT id FROM table_a WHERE purchased ='-infinity' AND created_at
> >> < '2007-01-01T00:00:00';
> >>
> >> INSERT INTO table_b_ids_to_delete
> >> SELECT table_b_id FROM table_a_table_b_xref
> >> INNER JOIN table_a_ids_to_delete ON (table_a_ids_to_delete.id =
> >> table_a_table_b.quote_id);
> >>
> >> DELETE FROM table_a_table_b_xref USING table_a_ids_to_delete
> >> WHERE table_a_table_b_xref.table_a_id = table_a_ids_to_delete.id;
> >>
> >> DELETE FROM table_b USING table_b_ids_to_delete
> >> WHERE table_b.id = table_b_ids_to_delete.id;
> >>
> >> DELETE FROM table_a USING table_a_ids_to_delete
> >> WHERE table_a.id = table_a_ids_to_delete.id;
> >>
> >> COMMIT;
> >>
> >> There're indices on table_a on the queried columns, table_b's primary
> >> key is it's id, and table_a_table_b_xref has an index on (table_a_id,
> >> table_b_id). There're FK defined on the xref table, hence why I'm
> >> deleting from it first.
> >>
> >> Does anyone have any ideas as to what I can do to make the deletes any
> >> faster? I'm running out of ideas!
> >>
> >> Thanks in advance,
> >>
> >> --
> >> Rob Emery
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (
> pgsql-performance(at)postgresql(dot)org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >
> >
>
>
>
> --
> Robert Emery
> Database Administrator
>
> | T: 0800 021 0888 | www.codeweavers.net |
> | Codeweavers Limited | Barn 4 | Dunston Business Village | Dunston | ST18
> 9AB |
> | Registered in England and Wales No. 04092394 | VAT registration no.
> 974 9705 63 |
>
> CUSTOMERS' BLOG TWITTER FACEBOOK LINKED IN
> DEVELOPERS' BLOG YOUTUBE
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2013-05-18 07:15:26 Re: Deleting Rows From Large Tables
Previous Message Merlin Moncure 2013-05-17 13:19:17 Re: Reliability with RAID 10 SSD and Streaming Replication