Re: Tuning massive UPDATES and GROUP BY's?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: fork <forkandwait(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Tuning massive UPDATES and GROUP BY's?
Date: 2011-03-10 16:45:08
Message-ID: AANLkTi=L9vaRA+n=i75-NkBfbcBM--JdrBXjL1UexNcY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Mar 10, 2011 at 9:40 AM, fork <forkandwait(at)gmail(dot)com> wrote:
> Given that doing a massive UPDATE SET foo = bar || ' ' || baz; on a 12 million
> row table (with about 100 columns -- the US Census PUMS for the 2005-2009 ACS)
> is never going to be that fast, what should one do to make it faster?
>
> I set work_mem to 2048MB, but it currently is only using a little bit of memory
> and CPU. (3% and 15% according to top; on a SELECT DISTINCT ... LIMIT earlier,
> it was using 70% of the memory).
>
> The data is not particularly sensitive; if something happened and it rolled
> back, that wouldnt be the end of the world.  So I don't know if I can use
> "dangerous" setting for WAL checkpoints etc.   There are also aren't a lot of
> concurrent hits on the DB, though a few.
>
> I am loathe to create a new table from a select, since the indexes themselves
> take a really long time to build.

you are aware that updating the field for the entire table, especially
if there is an index on it (or any field being updated), will cause
all your indexes to be rebuilt anyways? when you update a record, it
gets a new position in the table, and a new index entry with that
position. insert/select to temp, + truncate + insert/select back is
usually going to be faster and will save you the reindex/cluster.
otoh, if you have foreign keys it can be a headache.

> As the title alludes, I will also be doing GROUP BY's on the data, and would
> love to speed these up, mostly just for my own impatience...

need to see the query here to see if you can make them go faster.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2011-03-10 18:02:15 Re: Basic performance tuning on dedicated server
Previous Message Merlin Moncure 2011-03-10 16:32:56 Re: NULLS LAST performance