From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Yang Zhang <yanghatespam(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Optimizing bulk update performance |
Date: | 2013-04-27 04:41:37 |
Message-ID: | 28751.1367037697@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Yang Zhang <yanghatespam(at)gmail(dot)com> writes:
> It currently takes up to 24h for us to run a large set of UPDATE
> statements on a database, which are of the form:
> UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE
> id = constid
> (We're just overwriting fields of objects identified by ID.)
Forgive the obvious question, but you do have an index on "id", right?
Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?
> The tables have handfuls of indices each and no foreign key constraints.
How much is a "handful"?
> It takes 2h to import a `pg_dump` of the entire DB. This seems like a
> baseline we should reasonably target.
Well, maybe. You didn't say what percentage of the DB you're updating.
But the thing that comes to mind here is that you're probably incurring
a network round trip for each row, and maybe a query-planning round as
well, so you really can't expect that this is going to be anywhere near
as efficient as a bulk load operation. You could presumably get rid of
the planner overhead by using a prepared statement. Cutting the network
overhead is going to require a bit more ingenuity --- could you move
some logic into a stored procedure, perhaps, so that one command from
the client is sufficient to update multiple rows?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Yang Zhang | 2013-04-27 07:24:31 | Re: Optimizing bulk update performance |
Previous Message | Yang Zhang | 2013-04-27 02:30:05 | Re: Optimizing bulk update performance |