Re: Load experimentation

From: Ben Brehmer <benbrehmer(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Load experimentation
Date: 2009-12-10 20:24:06
Message-ID: 4B2158E6.8080504@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Andy,

Load is chugging along. We've optimized our postgres conf as much as
possible but are seeing the inevitable I/O bottleneck. I had the same
thought as you (converting inserts into copy's) a while back but
unfortunately each file has many inserts into many different tables.
Potentially I could rip through this with a little MapReduce job on
50-100 nodes, which is still something I might do.

One thought we are playing with was taking advantage of 4 x 414GB EBS
devices in a RAID0 configuration. This would spread disk writes across 4
block devices.

Right now I'm wrapping about 1500 inserts in a transaction block. Since
its an I/O bottlenecks, COPY statements might not give me much advantage.

Its definitely a work in progress :)

Ben

On 09/12/2009 5:31 AM, Andy Colson wrote:
> On 12/07/2009 12:12 PM, Ben Brehmer wrote:
>> Hello All,
>>
>> I'm in the process of loading a massive amount of data (500 GB). After
>> some initial timings, I'm looking at 260 hours to load the entire 500GB.
>> 10 days seems like an awfully long time so I'm searching for ways to
>> speed this up. The load is happening in the Amazon cloud (EC2), on a
>> m1.large instance:
>> -7.5 GB memory
>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
>> -64-bit platform
>>
>>
>> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
>> modifications I have made are as follows:
>>
>> shared_buffers = 786432
>> work_mem = 10240
>> maintenance_work_mem = 6291456
>> max_fsm_pages = 3000000
>> wal_buffers = 2048
>> checkpoint_segments = 200
>> checkpoint_timeout = 300
>> checkpoint_warning = 30
>> autovacuum = off
>>
>>
>> There are a variety of instance types available in the Amazon cloud
>> (http://aws.amazon.com/ec2/instance-types/) including high memory and
>> high CPU. High memory instance types come with 34GB or 68GB of memory.
>> High CPU instance types have a lot less memory (7GB max) but up to 8
>> virtual cores. I am more than willing to change to any of the other
>> instance types.
>>
>> Also, there is nothing else happening on the loading server. It is
>> completely dedicated to the load.
>>
>> Any advice would be greatly appreciated.
>>
>> Thanks,
>>
>> Ben
>>
>
> I'm kind of curious, how goes the load? Is it done yet? Still
> looking at days'n'days to finish?
>
> I was thinking... If the .sql files are really nicely formatted, it
> would not be too hard to whip up a perl script to run as a filter to
> change the statements into copy's.
>
> Each file would have to only fill one table, and only contain inserts,
> and all the insert statements would have to set the same fields. (And
> I'm sure there could be other problems).
>
> Also, just for the load, did you disable fsync?
>
> -Andy
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Clemmons 2009-12-10 20:41:08 8.4.1 ubuntu karmic slow createdb
Previous Message Robert Haas 2009-12-10 17:19:44 Re: Fw: Help me put 2 Gigs of RAM to use