Re: Update big table

From: Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au>
To: BladeOfLight16 <bladeoflight16(at)gmail(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Update big table
Date: 2013-07-18 00:51:00
Message-ID: 8AE6CD7104B80845B0732DAC65C8B62951C2FC528A@rts-exchange1.traffic.redflex.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks.

I have fixed the problem. And now the query can run in 4 mins for 1.5 million records in TABLE1 and 0.5 million records in TABLE2.

The solution is I created a function that gets all event_id from TABLE2 then travel through each event_id and do the update for TABLE1. This removes the side effects of using IN predicate. Also I have
1. set FILLFACTOR = 70
2. remove trigger and index before the function, create index based on event_id
3. add trigger and index back after run function.

Regards,
Haiming

From: BladeOfLight16 [mailto:bladeoflight16(at)gmail(dot)com]
Sent: Monday, 15 July 2013 5:26 AM
To: pgsql-general(at)postgresql(dot)org; Haiming Zhang
Subject: Re: [GENERAL] Update big table

I don't believe you can use JOIN explicitly in this situation because it's an UPDATE, but I believe you can accomplish the same effect with FROM and WHERE.

UPDATE table SET column1 = TRUE
FROM table2
WHERE table1.event_id = table2.event_id;

I would make sure there's an index on table2.event_id if it's not the PK. If it's part of a composite key, either make sure that table2.event_id is the first column in the index (PostgreSQL will use an index like that; right?) or create an index where it is.

(Sorry. Forgot to hit Reply All.)
On Sun, Jul 14, 2013 at 7:03 AM, Vincenzo Romano <vincenzo(dot)romano(at)notorand(dot)it<mailto:vincenzo(dot)romano(at)notorand(dot)it>> wrote:
2013/7/14 Haiming Zhang <Haiming(dot)Zhang(at)redflex(dot)com(dot)au<mailto:Haiming(dot)Zhang(at)redflex(dot)com(dot)au>>:
> Hi All,
>
>
>
> I am using postgres 9.1, I have a question about updating big table. Here is
> the basic information of this table.
>
> 1. This table has one primary key and other 11 columns.
>
> 2. It also has a trigger that before update records,
> another table got updated first.
>
> 3. The has millions of records now.
> 4. I am not allowed to delete records in this table when
> UPDATE
>
> The problem is when I do a "Update" query it takes a long time to execute.
> Eg. when I run query like this " update TABLE set column1 = true where
> EVENT_ID in (select EVENT_ID from TABLE2 );" , it took hours to update the
> whole table. In order to optimize the update speed. I tried the following
> strategies:
>
> 1. create index based on primary key, column1 and
> combination of primary key and column1.
>
> 2. Alter FILLFACTOR = 70, vacuum all and then reindex
>
> 3. drop trigger before update
>
> Then I use "EXPLAIN" to estimate query plan, all of the above strategies do
> not improve the UPDATE speed dramatically.
>
>
>
> Please comments on my three strategies (eg, does I index too many columns in
> 1?) and please advise me how to improve the update speed. Any advice is
> welcomed. I appreciate all you help.
>
>
>
> Thanks,
>
>
>
> Regards,
>
> Haiming
A JOIN would solve your speed problem.
The IN() predicate is the cause.
AFAIK.

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org<mailto:pgsql-general(at)postgresql(dot)org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

________________________________
If you are not an authorised recipient of this e-mail, please contact me at Redflex immediately by return phone call or by email. In this case, you should not read, print, retransmit, store or act in reliance on this e-mail or any attachments, and should destroy all copies of them. This e-mail and any attachments are confidential and may contain privileged information and/or copyright material of Redflex or third parties. You should only retransmit, distribute or commercialise the material if you are authorised to do so. This notice should not be removed.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Luca Ferrari 2013-07-18 06:29:56 Re: pgAdmin for ubuntu
Previous Message David Johnston 2013-07-17 22:56:02 Re: dynamic table names