The fastest way to update thousands of rows in moderately sized table

From: twoflower <standa(dot)kurik(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: The fastest way to update thousands of rows in moderately sized table
Date: 2015-07-23 20:17:46
Message-ID: 1437682666316-5859144.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,I have a table with 30 million records in which I need to update a
single column for a couple of thousands of rows, let's say 10 000. The new
column value is identical for all matching rows.Doing

update "TRANSLATION" set fk_assignmentwhere fk_job = 1000;

takes 45 seconds. I understand that UPDATE is basically an INSERT followed
by DELETE but I was hoping I could do better than that.I found a suggestion
to use a temporary table to speed things up, so now I have this:

create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;

update "temp_table" set fk_assignment = null;

update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;

drop table "temp_table";

This got me to about 37 seconds. Still pretty slow.The TRANSLATION has an
index and a foreign key constraint on fk_assignment. Removing the constraint
brought very little benefit. Removing the index is probably out of question
as these kind of operations are very frequent and the table itself is used
heavily, including the index.Execution plan:

Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983 width=405)
(actual time=43262.266..43262.266 rows=0 loops=1)
&nbsp;-> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual
time=0.566..146.084 rows=8920 loops=1)
&nbsp;&nbsp;&nbsp;-> Seq Scan on temp_segs _source (cost=0.00..218.83
rows=13983 width=22) (actual time=0.457..13.994 rows=8920 loops=1)
&nbsp;&nbsp;&nbsp;-> Index Scan using "TRANSLATION_pkey" on "TRANSLATION"
_target (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1
loops=8920)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Index Cond: (id = _source.id)

Planning time: 1.167 ms
Execution time: 43262.577 ms

Is there anything else worth trying? Are these numbers something to be
expected, from your experience?

I have Postgres 9.4, the database is on SSD.

Thank you very much for any suggestions.

Standa

--
View this message in context: http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2015-07-23 20:48:36 Re: Using the database to validate data
Previous Message William Dunn 2015-07-23 19:50:46 Re: Setting up HA postgresql