Re: COPY v. java performance comparison

From: Steve Atkins <steve(at)blighty(dot)com>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: COPY v. java performance comparison
Date: 2014-04-02 20:55:13
Message-ID: D02C467A-2B1E-4114-AFBB-6787D0080480@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Apr 2, 2014, at 1:14 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:

> On 04/02/2014 01:56 PM, Steve Atkins wrote:
>> On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsargent(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> Impatience got the better of me and I killed the second COPY. This time it had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to do the whole thing.
>>>
>> That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited?
>>
>> Have you read
>> http://www.postgresql.org/docs/current/interactive/populate.html
>> ?
>>
>> Cheers,
>> Steve
>>
>>
> The copy command was pretty vanilla:
> copy oldstyle from '/export/home/rob/share/testload/<file-redacted>' with delimiter ' ';
> I've been to that page, but (as I read them) none sticks out as a sure thing. I'm not so worried about the actual performance as I am with the relative throughput (sixes so far).
>
> I'm not cpu bound, but I confess I didn't look at io stats during the copy runs. I just assume it was pegged :)

If each row is, say, 100 bytes including the per-row overhead (plausible for a uuid and a couple of strings), and you're inserting 800 rows a second, that's 80k/second, which would be fairly pathetic.

On my laptop (which has an SSD, sure, but it's still a laptop) I can insert 40M rows of data that has a few integers and a few small strings in about 52 seconds. And that's just using a simple, single-threaded load using psql to run copy from stdin, reading from the same disk as the DB is on, with no tuning of any parameters to speed up the load.

12 hours suggests there's something fairly badly wrong with what you're doing. I'd definitely look at the server logs, check system load and double check what you're actually running.

(Running the same thing on a tiny VM, one that shares a single RAID5 of 7200rpm drives with about 40 other VMs, takes a shade under two minutes, mostly CPU bound).

Cheers,
Steve

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rob Sargent 2014-04-02 21:04:31 Re: COPY v. java performance comparison
Previous Message Bill Moran 2014-04-02 20:53:08 Re: COPY v. java performance comparison