Re: Very slow update statement on 40mio rows

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Florian Schröck <fschroeck(at)aycan(dot)de>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow update statement on 40mio rows
Date: 2013-02-19 20:04:22
Message-ID: CAHyXU0zpEgV1FsekySuz8YWKbUHXK0eGQpcThNEac8U6jSyfRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck <fschroeck(at)aycan(dot)de> wrote:
> Hello Kevin,
> not updating every row which doesn't need the update solved the problem!
> Your query took only 1 minute. :)
>
> Thank you so much for the fast response, have a great weekend!
>
> PS: When you switch to "TEXT" on the explain URL you can see the final
> runtime which was 66 minutes with the original statement.
>
> Best regards,
> Florian
>
> On 02/15/2013 03:59 PM, Kevin Grittner wrote:
>> Florian Schröck <fschroeck(at)aycan(dot)de> wrote:
>>
>>> UPDATE BackupFiles
>>> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>> cStatus='NEW'::StatusT, bOnSetBlue=false,
>>> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>> WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>>>
>>> Explain analyze: http://explain.depesz.com/s/8y5
>>> The statement takes 60-90 minutes.
>> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
>> seconds.
>>
>>> I tried to optimize the settings but until now without success.
>>>
>>> Can we optimize this update statement somehow? Do you have any
>>> other ideas?
>> Are there any rows which would already have the values that you are
>> setting? If so, it would be faster to skip those by using this
>> query:
>>
>> UPDATE BackupFiles
>> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>> cStatus='NEW'::StatusT, bOnSetBlue=false,
>> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>> WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
>> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>> OR cStatus <> 'NEW'::StatusT
>> OR bOnSetBlue IS DISTINCT FROM false
>> OR bOnSetYellow IS DISTINCT FROM false
>> OR nLastBackupTS <> '0001-01-01 00:00:00');
>>
>> Another way to accomplish this is with the
>> suppress_redundant_updates_trigger() trigger function:
>>
>> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html

if the number of rows you actually update is not very large relative
to size of the table, just for fun, try this:

CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
RETURNS BOOL AS
$$
SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
AND
(($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
IS DISTINCT FROM /* simple != will suffice if values are never null */
(0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
$$ LANGUAGE SQL IMMUTABLE;

CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
WHERE BakupFilesCandidateReset(BackupFiles);

SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
etc

idea here is to maintain partial boolean index representing candidate
records to update. plus it's nifty. this is basic mechanism that
can be used as foundation for very fast push pull queues.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2013-02-19 23:17:22 Re: High CPU usage / load average after upgrading to Ubuntu 12.04
Previous Message Josh Berkus 2013-02-19 17:51:16 Re: High CPU usage / load average after upgrading to Ubuntu 12.04