Quick Links

Re: Tuning massive UPDATES and GROUP BY's?

From:	runner <runner(at)winning(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Tuning massive UPDATES and GROUP BY's?
Date:	2011-03-14 15:54:39
Message-ID:	8CDB0773FE92165-27F8-BD4@web-mmc-m04.sysops.aol.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

> Bulk data imports of this size I've done with minimal pain by simply
> breaking the raw data into chunks (10M records becomes 10 files of
> 1M records), on a separate spindle from the database, and performing
> multiple COPY commands but no more than 1 COPY per server core.
> I tested this a while back on a 4 core server and when I attempted 5
> COPY's at a time the time to complete went up almost 30%. I don't
> recall any benefit having fewer than 4 in this case but the server was
> only processing my data at the time. Indexes were on the target table
> however I dropped all constraints. The UNIX split command is handy
> for breaking the data up into individual files.

I'm not using COPY. My dump file is a bunch if INSERT INTO statements. I know it would be faster to use copy. If I can figure out how to do this in one hour I will try it. I did two mysqldumps, one with INSERT INTO and one as CSV to I can try COPY at a later time. I'm running five parallel psql processes to import the data which has been broken out by table.

In response to

Re: Tuning massive UPDATES and GROUP BY's? at 2011-03-14 13:34:28 from Greg Spiegelberg

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Claudio Freire	2011-03-14 15:54:56	Re: Performance regression from 8.3.7 to 9.0.3
Previous Message	Andres Freund	2011-03-14 14:08:14	Re: unexpected stable function behavior