From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Fast way to delete big table? |
Date: | 2016-05-16 13:50:27 |
Message-ID: | 1edc1b66-8133-4095-faf3-11db2021b705@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 05/15/2016 10:33 PM, Haiming Zhang wrote:
> Hi Gavin,
>
> Thanks for the suggestion. What you said was what I tried to do (except
> the last reindexes, planned to do it when the table is cleaned up),
> however it was too slow. I have run for two days, nothing much happened.
From your original post I gathered you did not do what Gavin suggested.
In that you dropped the indexes first and the combined the UPDATE
summary and DELETE row into one operation. So I am little confused on
what you are asking. Are you look for suggestions on what to do in the
future or how to make the existing condition(no indexes on the big
table) work better or both?
>
> Truncate does not work for my purpose.
>
> Regards,
> Haiming
>
>
>
>
>
>
> *Haiming Zhang* | Engineer | *Redflex Group*
> *T* +61 3 9674 1868 | *F* +61 3 9699 3566 | *E*
> Haiming(dot)Zhang(at)redflex(dot)com(dot)au
> 31 Market Street, (PO Box 720), South Melbourne VIC 3205, Australia
>
> If you are not an authorised recipient of this e-mail, please contact me
> at Redflex immediately by return phone call or by email. In this case,
> you should not read, print, retransmit, store or act in reliance on this
> e-mail or any attachments, and should destroy all copies of them. This
> e-mail and any attachments are confidential and may contain privileged
> information and/or copyright material of Redflex or third parties. You
> should only retransmit, distribute or commercialise the material if you
> are authorised to do so. This notice should not be removed.
>
>
>
>
>
>
> -----Original Message-----
> From: Gavin Flower [mailto:GavinFlower(at)archidevsys(dot)co(dot)nz]
> Sent: Monday, 16 May 2016 3:24 PM
> To: Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au>;
> pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Fast way to delete big table?
>
> On 16/05/16 17:09, Haiming Zhang wrote:
>>
>> Hi All,
>>
>> I have a big table (200G with 728 million records), the table slows
>> down lots of things. It's time to clean the data up. The requirement
>> is when I delete I must summarise few columns to a new table for
>> backup purpose. So that mean I need to go through each row and add the
>> columns' value to the summary table (the corresponding category).
>>
>> The table has indexes (multicoumn indexes) before, I removed the
>> indexes because I read on forum says the indexes *heavily* reduce the
>> speed of deleting. That's true if I only delete from the table, but my
>> case is I first SELECT this row, then add few values to the summary
>> table and then DELETE the row according to its multiple primary key.
>> This seems to be a bad idea now as it takes very long time to DELETE
>> the row (as delete needs to find the row first).
>>
>> Here are the two version of the delete functions, please help to point
>> out how can I speed it up.
>>
>> 1.
>>
>> CREATE OR REPLACE FUNCTION summary_delete_table()
>>
>> RETURNS integer AS
>>
>> $BODY$
>>
>> DECLARE
>>
>> rec RECORD;
>>
>> subrec RECORD;
>>
>> BEGIN
>>
>> FOR rec IN SELECT * FROM tableA limit 100 LOOP
>>
>> BEGIN
>>
>> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
>> where category match;
>>
>> delete from tableA where tableA.primaryKeyA=rec.primaryKeyA and
>> tableA.primaryKeyB=rec.primaryKeyB;
>>
>> END;
>>
>> END LOOP;
>>
>> return 1;
>>
>> END;
>>
>> $BODY$
>>
>> LANGUAGE plpgsql VOLATILE
>>
>> COST 100;
>>
>> And then I have a .bat script to loop the above function million times.
>>
>> 2.
>>
>> CREATE OR REPLACE FUNCTION summary_delete_table()
>>
>> RETURNS integer AS
>>
>> $BODY$
>>
>> DECLARE
>>
>> rec RECORD;
>>
>> td_cursor CURSOR FOR SELECT * FROM tableA limit 100;
>>
>> BEGIN
>>
>> FOR rec IN td_cursor LOOP
>>
>> BEGIN
>>
>> UPDATE summaryTable set count1 = rec.count1 + rec.count1, count2 =...
>> where category match;
>>
>> delete from tableA WHERE CURRENT OF td_cursor;
>>
>> END;
>>
>> END LOOP;
>>
>> return 1;
>>
>> END;
>>
>> $BODY$
>>
>> LANGUAGE plpgsql VOLATILE
>>
>> COST 100;
>>
>> Method 2 is bit faster but not much, the delete speed is 2478 rows/s
>> for method 2 and 2008 rows/s for method 1.
>>
>> Any suggestions are welcomed.
>>
>> BTW, I guess if reindex, it may take few days to finish.
>>
>> Also, I tried change delete 100 rows at a time and 1000, 2000. The
>> result showed 1000 is faster than 100 and 2000 a time.
>>
>> Thanks and Regards,
>>
>> Haiming
>>
> [...]
>
> In one transaction:
>
> 1. populate the summary table
> 2. drop indexes on tableA
> 3. delete selected entries in tableA
> 4. recreate indexes for tableA
>
> If deleting all entries, then simply truncate it!
>
> N.B. I have NOT checked the fine print in the documentation, nor tested
> this - so be warned! :-)
>
>
> Cheers,
> Gavin
>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2016-05-16 14:09:23 | Re: Ascii Elephant for text based protocols - Final |
Previous Message | Charles Clavadetscher | 2016-05-16 11:47:56 | Re: Ascii Elephant for text based protocols - Final |