Re: Fast way to delete big table?

From: Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fast way to delete big table?
Date: 2016-05-16 05:33:45
Message-ID: 59f82da092904950b63e3601d15851b8@RTS-EXCHANGE01.traffic.redflex.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Gavin,

Thanks for the suggestion. What you said was what I tried to do (except the last reindexes, planned to do it when the table is cleaned up), however it was too slow. I have run for two days, nothing much happened.

Truncate does not work for my purpose.

Regards,
Haiming

[cid:logo_new_96ac6d31-abb9-4f10-a3b7-74542f8c4a5f1111111111111111111111111111111.png]

Haiming Zhang | Engineer | Redflex Group
T +61 3 9674 1868 | F +61 3 9699 3566 | E Haiming(dot)Zhang(at)redflex(dot)com(dot)au
31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia

If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

-----Original Message-----
From: Gavin Flower [mailto:GavinFlower(at)archidevsys(dot)co(dot)nz]
Sent: Monday, 16 May 2016 3:24 PM
To: Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au>; pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Fast way to delete big table?

On 16/05/16 17:09, Haiming Zhang wrote:
>
> Hi All,
>
> I have a big table (200G with 728 million records), the table slows
> down lots of things. It's time to clean the data up. The requirement
> is when I delete I must summarise few columns to a new table for
> backup purpose. So that mean I need to go through each row and add the
> columns' value to the summary table (the corresponding category).
>
> The table has indexes (multicoumn indexes) before, I removed the
> indexes because I read on forum says the indexes *heavily* reduce the
> speed of deleting. That's true if I only delete from the table, but my
> case is I first SELECT this row, then add few values to the summary
> table and then DELETE the row according to its multiple primary key.
> This seems to be a bad idea now as it takes very long time to DELETE
> the row (as delete needs to find the row first).
>
> Here are the two version of the delete functions, please help to point
> out how can I speed it up.
>
> 1.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec RECORD;
>
> subrec RECORD;
>
> BEGIN
>
> FOR rec IN SELECT * FROM tableA limit 100 LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
> tableA.primaryKeyB=rec.primaryKeyB;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> And then I have a .bat script to loop the above function million times.
>
> 2.
>
> CREATE OR REPLACE FUNCTION summary_delete_table()
>
> RETURNS integer AS
>
> $BODY$
>
> DECLARE
>
> rec RECORD;
>
> td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
>
> BEGIN
>
> FOR rec IN td_cursor LOOP
>
> BEGIN
>
> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
> where category match;
>
> delete from tableA WHERE CURRENT OF td_cursor;
>
> END;
>
> END LOOP;
>
> return 1;
>
> END;
>
> $BODY$
>
> LANGUAGE plpgsql VOLATILE
>
> COST 100;
>
> Method 2 is bit faster but not much, the delete speed is 2478 rows/s
> for method 2 and 2008 rows/s for method 1.
>
> Any suggestions are welcomed.
>
> BTW, I guess if reindex, it may take few days to finish.
>
> Also, I tried change delete 100 rows at a time and 1000, 2000. The
> result showed 1000 is faster than 100 and 2000 a time.
>
> Thanks and Regards,
>
> Haiming
>
[...]

In one transaction:

1. populate the summary table
2. drop indexes on tableA
3. delete selected entries in tableA
4. recreate indexes for tableA

If deleting all entries, then simply truncate it!

N.B. I have NOT checked the fine print in the documentation, nor tested this - so be warned! :-)

Cheers,
Gavin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Haiming Zhang 2016-05-16 05:35:44 Re: Fast way to delete big table?
Previous Message John R Pierce 2016-05-16 05:28:12 Re: Fast way to delete big table?