Quick Links

Re: Speed up UPDATE query?

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com>
Cc:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: Speed up UPDATE query?
Date:	2009-11-01 00:08:06
Message-ID:	dcc563d10910311708s281a81deg81a3968cbc4073ab@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On Thu, Oct 29, 2009 at 11:50 AM, Lee Hachadoorian
<lee(dot)hachadoorian(at)gmail(dot)com> wrote:
> I'm trying to update several tables (all child tables of the same
> parent), and as the number of records increases, the length of time it
> takes to run the update is shooting up exponentially. I have imported
> the new data to an import table, and then join the import table to the
> update table. The update statement looks like:
>
> UPDATE
> household_2000 h
> SET
> hhincome = new_hhincome
> FROM (
> SELECT
> serial, new_hhincome
> FROM
> import
> WHERE
> year = 2000
> ) r
> WHERE
> h.serial = r.serial
>
> household_2000 is a child table of a household table that, as you
> might guess, only contains records from the year 2000. I am putting a
> year = 2000 restriction on the import table and then linking on the
> unique identifier.
>
> For different child tables, this is how long the update takes to run
> (numbers are approximate):
>
> Records Cost (via EXPLAIN) Actual time
> 460,000 300,000 23 seconds
> 510,000 320,000 26 seconds
> 1.2 million 670,000 3:16
> 1.3 million 820,000 3:25
> 6.2 million 2.7 million ~2.5 hours
>
> So, the cost estimate given by EXPLAIN seems to be roughly
> proportional to the number of records in the dataset, but the actual
> time it takes to run seems to increase faster than the cost, even for
> the small and medium tables, and shoots through the roof for the large
> tables. Since I need to run this on additional child tables that are
> larger (the largest is 14 million records), I want to know what I can
> do to speed up the query.
>
> Here's the EXPLAIN for the query. Note that the query plan is the same
> for the small, medium, and large tables.
>
> Hash Join (cost=1268532.36..2379787.06 rows=5465837 width=1128)
> Hash Cond: (import_6_17_rev_hh.serial = h.serial)
> -> Bitmap Heap Scan on import_6_17_rev_hh
> (cost=126551.72..308495.69 rows=5465837 width=8)
> Recheck Cond: (year = 1990)
> -> Bitmap Index Scan on import_6_17_rev_hh_pkey
> (cost=0.00..125185.26 rows=5465837 width=0)
> Index Cond: (year = 1990)
> -> Hash (cost=295596.06..295596.06 rows=5527406 width=1124)
> -> Seq Scan on household_1990 h (cost=0.00..295596.06
> rows=5527406 width=1124)

Any chance of getting the output of explain analyze for a fast and a
slow run of this query?

In response to

Speed up UPDATE query? at 2009-10-29 17:50:38 from Lee Hachadoorian

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Andrew Hall	2009-11-01 18:04:14	PostgreSQL Security/Roles/Grants
Previous Message	Bernd Nawothnig	2009-10-31 21:11:06	Re: how to: refer to select list calculations other places in the calculations.