Re: Very slow update statement on 40mio rows

From: Florian Schröck <fschroeck(at)aycan(dot)de>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very slow update statement on 40mio rows
Date: 2013-02-26 08:01:14
Message-ID: 512C6BCA.1060407@aycan.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Merlin,
thanks for the feedback, I forwarded this to my developer, this is an
interesting approach.

--

Best regards

Florian Schröck

On 02/19/2013 09:04 PM, Merlin Moncure wrote:
> On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck <fschroeck(at)aycan(dot)de> wrote:
>> Hello Kevin,
>> not updating every row which doesn't need the update solved the problem!
>> Your query took only 1 minute. :)
>>
>> Thank you so much for the fast response, have a great weekend!
>>
>> PS: When you switch to "TEXT" on the explain URL you can see the final
>> runtime which was 66 minutes with the original statement.
>>
>> Best regards,
>> Florian
>>
>> On 02/15/2013 03:59 PM, Kevin Grittner wrote:
>>> Florian Schröck <fschroeck(at)aycan(dot)de> wrote:
>>>
>>>> UPDATE BackupFiles
>>>> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>>> cStatus='NEW'::StatusT, bOnSetBlue=false,
>>>> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>>> WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';
>>>>
>>>> Explain analyze: http://explain.depesz.com/s/8y5
>>>> The statement takes 60-90 minutes.
>>> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
>>> seconds.
>>>
>>>> I tried to optimize the settings but until now without success.
>>>>
>>>> Can we optimize this update statement somehow? Do you have any
>>>> other ideas?
>>> Are there any rows which would already have the values that you are
>>> setting? If so, it would be faster to skip those by using this
>>> query:
>>>
>>> UPDATE BackupFiles
>>> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
>>> cStatus='NEW'::StatusT, bOnSetBlue=false,
>>> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
>>> WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
>>> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0
>>> OR cStatus <> 'NEW'::StatusT
>>> OR bOnSetBlue IS DISTINCT FROM false
>>> OR bOnSetYellow IS DISTINCT FROM false
>>> OR nLastBackupTS <> '0001-01-01 00:00:00');
>>>
>>> Another way to accomplish this is with the
>>> suppress_redundant_updates_trigger() trigger function:
>>>
>>> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html
> if the number of rows you actually update is not very large relative
> to size of the table, just for fun, try this:
>
>
> CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
> RETURNS BOOL AS
> $$
> SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
> AND
> (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
> ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
> IS DISTINCT FROM /* simple != will suffice if values are never null */
> (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
> $$ LANGUAGE SQL IMMUTABLE;
>
> CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
> WHERE BakupFilesCandidateReset(BackupFiles);
>
>
> SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
> UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
> etc
>
> idea here is to maintain partial boolean index representing candidate
> records to update. plus it's nifty. this is basic mechanism that
> can be used as foundation for very fast push pull queues.
>
> merlin
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Andre 2013-02-26 12:53:17 Re: Server stalls, all CPU 100% system time
Previous Message Jeff Frost 2013-02-25 22:30:00 Re: High CPU usage / load average after upgrading to Ubuntu 12.04