Re: exceptionally large UPDATE

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: exceptionally large UPDATE
Date: 2010-10-28 04:05:07
Message-ID: 4CC8F673.3020409@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Ivan Sergio Borgonovo wrote:
> I've to make large UPDATE to a DB.
> The largest UPDATE involve a table that has triggers and a gin
> index on a computed tsvector.
> The table is 1.5M records with about 15 fields of different types.
> I've roughly 2.5-3Gb of ram dedicated to postgres.
>
> UPDATE queries are simple, few of them use join and mainly consist
> of updating records from temporary tables that contains a very
> similar structure to the target.
>
> This updates are rare so I can afford to tune postgresql just for
> this large update and then return to a more balanced configuration.
> I can even afford to be the only user of the DB so responsiveness of
> the application using the DB is not an issue. Duration of the update
> is.
>
> Anything I can tune in postgresql.conf to speed up the UPDATE?
>
> I'm increasing maintenance_work_mem to 180MB just before recreating
> the gin index. Should it be more?
> The update should be monolithic and it is inside a single
> transaction. Since I can afford to be the only user of the DB for a
> while, is there anything I can tune to take advantage of it?
> What else could I change to speed up the update?
>
> The triggers recreate the tsvector. One of the component of the
> tsvector is taken from a join table.
>
> I'll surely drop the gin index and recreate it when everything is
> over.
> I'm not sure if it's a good idea to drop the triggers since I'll
> have to update the tsvectr later and I suspect this will cause twice
> the disk IO.
>
> thanks
>
>

Is there an inherent value in a single transaction for such an update?
By that I mean Do all the updates actually pertain to a single event?
Nice as it is to get a clean slate if the a single record has a problem,
it's also nice when N-1 of N batches succeed in a realistic amount of
time and you're left hunting for the problematic record in one Nth of
the records.

Corollary: if you can afford to be the only user for a while perhaps you
can afford to reload from dump if you need to get back to ground zero.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oluwatope Akinniyi 2010-10-28 07:30:52 Hello, my dear friends:
Previous Message Ivan Sergio Borgonovo 2010-10-28 02:26:27 exceptionally large UPDATE