From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | twoflower <standa(dot)kurik(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: The fastest way to update thousands of rows in moderately sized table |
Date: | 2015-07-23 20:55:55 |
Message-ID: | 55B154DB.30305@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 07/23/2015 01:17 PM, twoflower wrote:
> Hello, I have a table with 30 million records in which I need to update
> a single column for a couple of thousands of rows, let's say 10 000. The
> new column value is identical for all matching rows. Doing
>
> |update "TRANSLATION" set fk_assignment where fk_job = 1000; |
>
> takes 45 seconds. I understand that |UPDATE| is basically an |INSERT|
> followed by |DELETE| but I was hoping I could do better than that. I
> found a suggestion to use a temporary table to speed things up, so now I
> have this:
>
> |create unlogged table "temp_table" as
> select id, fk_assignment
> from "TRANSLATION"
> where fk_job = 1000;
>
> update "temp_table" set fk_assignment = null;
>
> update "TRANSLATION" _target
> set fk_assignment = _source.fk_assignment
> from "temp_table" _source
> where _target.id = _source.id;
>
> drop table "temp_table";
Have you tried wrapping the above in a BEGIN/COMMIT block?
So:
BEGIN;
create unlogged table "temp_table" as
select id, fk_assignment
from "TRANSLATION"
where fk_job = 1000;
update "temp_table" set fk_assignment = null;
update "TRANSLATION" _target
set fk_assignment = _source.fk_assignment
from "temp_table" _source
where _target.id = _source.id;
drop table "temp_table";
COMMIT;
> |
>
> This got me to about 37 seconds. Still pretty slow. The |TRANSLATION|
> has an index and a foreign key constraint on |fk_assignment|. Removing
> the constraint brought very little benefit. Removing the index is
> probably out of question as these kind of operations are very frequent
> and the table itself is used heavily, including the index. Execution plan:
>
> |Update on "TRANSLATION" _target (cost=0.56..116987.76 rows=13983
> width=405) (actual time=43262.266..43262.266 rows=0 loops=1)
> -> Nested Loop (cost=0.56..116987.76 rows=13983 width=405) (actual
> time=0.566..146.084 rows=8920 loops=1)
> -> Seq Scan on temp_segs _source (cost=0.00..218.83 rows=13983
> width=22) (actual time=0.457..13.994 rows=8920 loops=1)
> -> Index Scan using "TRANSLATION_pkey" on "TRANSLATION" _target
> (cost=0.56..8.34 rows=1 width=391) (actual time=0.009..0.011 rows=1
> loops=8920)
> Index Cond: (id = _source.id)
>
> Planning time: 1.167 ms
> Execution time: 43262.577 ms
> |
> Is there anything else worth trying? Are these numbers something to be
> expected, from your experience?
>
> I have Postgres 9.4, the database is on SSD.
>
> Thank you very much for any suggestions.
>
> Standa
> ------------------------------------------------------------------------
> View this message in context: The fastest way to update thousands of
> rows in moderately sized table
> <http://postgresql.nabble.com/The-fastest-way-to-update-thousands-of-rows-in-moderately-sized-table-tp5859144.html>
> Sent from the PostgreSQL - general mailing list archive
> <http://postgresql.nabble.com/PostgreSQL-general-f1843780.html> at
> Nabble.com.
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Clarke | 2015-07-23 20:58:21 | Re: Using the database to validate data |
Previous Message | Adrian Klaver | 2015-07-23 20:48:36 | Re: Using the database to validate data |