Re: Mass updates on a large table

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Mark Steben" <msteben(at)autorevenue(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Mass updates on a large table
Date: 2007-08-10 17:57:53
Message-ID: dcc563d10708101057l7dd53571w39a333988a12c84@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 8/10/07, Mark Steben <msteben(at)autorevenue(dot)com> wrote:
>
> Good afternoon,
>
> I am attempting an update on two new date field columns on a 17 million row
> table. Every row gets updated.
>
> The update statement is a simple one:
>
> UPDATE EMAILRCPTS SET ID = ID
>
> And the update of the new date fields themselves occurs as the result of a
> before trigger.
>
> The update took 3 days, 10 hours to complete on the testing box.

That's quite a while for only 17 million rows. Are these rows
particularly wide?
Is it possible to do it by groups with a vacuum in between each group?
That would keep the bloat down.

You don't mention your vacuuming strategy. That might affect performance here.

Also, are there any FKs to / from this table?

> To minimize the impact of checkpoints. The SHARED_BUFFERS parameter has
> been bumped up to 140000 on a 20meg RAM box.

I assume you meant 20Gig box.

Under 7.4 larger shared_buffers may not be a good thing. that's a
very large shared buffer setting for 7.4 to handle.

> There are about 9 indexes on this table although none of them reference the
> date fields so since there are no inserts I don't think they would have an
> impact on the update (I've been wrong before though)

Doesn't matter. PostgreSQL's implementation of MVCC means that each
update results in a new row, and therefore each index has to be
updated for each row updated.

> Would an update statement referencing the date fields work faster than a
> trigger?

Possibly.

> Do you have any other suggestions to speed this up?
> We are at Postgres 7.4.5.

Upgrade to a modern version? 7.4 is getting old fast, and 7.4.5 has a
LOT of bugs that have been fixed in later versions. It's up to like
7.4.17 so you're missing a LOT of updates just in the branch you're
in. But upgrading to 8.2.4 would definitely be a help.

> We simply cannot afford this table to be down for 3+ days during a
> production update. The production box is a 32meg RAM box.

I would question the business process that requires an entire 17
million row table be updated.

Also, posting your schema and your triggers might help a bit as well.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2007-08-10 18:20:51 Re: Mass updates on a large table
Previous Message Chris Hoover 2007-08-10 17:48:56 Re: Easy way to change table schema?