| From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
|---|---|
| To: | Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com> |
| Cc: | pgsql-sql(at)postgresql(dot)org |
| Subject: | Re: Speed up UPDATE query? |
| Date: | 2009-11-01 00:08:06 |
| Message-ID: | dcc563d10910311708s281a81deg81a3968cbc4073ab@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-sql |
On Thu, Oct 29, 2009 at 11:50 AM, Lee Hachadoorian
<lee(dot)hachadoorian(at)gmail(dot)com> wrote:
> I'm trying to update several tables (all child tables of the same
> parent), and as the number of records increases, the length of time it
> takes to run the update is shooting up exponentially. I have imported
> the new data to an import table, and then join the import table to the
> update table. The update statement looks like:
>
> UPDATE
> household_2000 h
> SET
> hhincome = new_hhincome
> FROM (
> SELECT
> serial, new_hhincome
> FROM
> import
> WHERE
> year = 2000
> ) r
> WHERE
> h.serial = r.serial
>
> household_2000 is a child table of a household table that, as you
> might guess, only contains records from the year 2000. I am putting a
> year = 2000 restriction on the import table and then linking on the
> unique identifier.
>
> For different child tables, this is how long the update takes to run
> (numbers are approximate):
>
> Records Cost (via EXPLAIN) Actual time
> 460,000 300,000 23 seconds
> 510,000 320,000 26 seconds
> 1.2 million 670,000 3:16
> 1.3 million 820,000 3:25
> 6.2 million 2.7 million ~2.5 hours
>
> So, the cost estimate given by EXPLAIN seems to be roughly
> proportional to the number of records in the dataset, but the actual
> time it takes to run seems to increase faster than the cost, even for
> the small and medium tables, and shoots through the roof for the large
> tables. Since I need to run this on additional child tables that are
> larger (the largest is 14 million records), I want to know what I can
> do to speed up the query.
>
> Here's the EXPLAIN for the query. Note that the query plan is the same
> for the small, medium, and large tables.
>
> Hash Join (cost=1268532.36..2379787.06 rows=5465837 width=1128)
> Hash Cond: (import_6_17_rev_hh.serial = h.serial)
> -> Bitmap Heap Scan on import_6_17_rev_hh
> (cost=126551.72..308495.69 rows=5465837 width=8)
> Recheck Cond: (year = 1990)
> -> Bitmap Index Scan on import_6_17_rev_hh_pkey
> (cost=0.00..125185.26 rows=5465837 width=0)
> Index Cond: (year = 1990)
> -> Hash (cost=295596.06..295596.06 rows=5527406 width=1124)
> -> Seq Scan on household_1990 h (cost=0.00..295596.06
> rows=5527406 width=1124)
Any chance of getting the output of explain analyze for a fast and a
slow run of this query?
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Hall | 2009-11-01 18:04:14 | PostgreSQL Security/Roles/Grants |
| Previous Message | Bernd Nawothnig | 2009-10-31 21:11:06 | Re: how to: refer to select list calculations other places in the calculations. |