Quick Links

Re: COPY v. java performance comparison

From:	Rob Sargent <robjsargent(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: COPY v. java performance comparison
Date:	2014-04-02 20:14:58
Message-ID:	533C6FC2.70604@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 04/02/2014 01:56 PM, Steve Atkins wrote:
> On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>
>> I'm playing with various data models to compare performance and practicalities and not sure if I should be surprised by the numbers I'm getting. I hope this report isn't too wishy-washy for reasoned comment.
>>
>> One model says a genotype is defined as follows:
>> Table "public.oldstyle"
>> +-------------+--------------+-----------+
>> | Column | Type | Modifiers |
>> +-------------+--------------+-----------+
>> | id | uuid | not null |
>> | sample_name | text | not null |
>> | marker_name | text | not null |
>> | allele1 | character(1) | |
>> | allele2 | character(1) | |
>> +-------------+--------------+-----------+
>> (0. id is a Primary Key)
>> (1. Take what you will from the table name.)
>> (2. I hadn't thought of "char" type at this point)
>> (3. Ultimately the names would become ids, RI included)
>> (4. We're loading 39 samples and ~950K markers)
>> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours (800+ records/sec). Then I tried COPY and killed that after 11.25 hours when I realised that I had added on non-unque index on the name fields after the first load. By that point is was on line 28301887, so ~0.75 done which implies it would have take ~15hours to complete.
>>
>> Would the overhead of the index likely explain this decrease in throughput?
>>
>> Impatience got the better of me and I killed the second COPY. This time it had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to do the whole thing.
> That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited?
>
> Have you read http://www.postgresql.org/docs/current/interactive/populate.html ?
>
> Cheers,
> Steve
>
The copy command was pretty vanilla:

copy oldstyle from '/export/home/rob/share/testload/<file-redacted>'
with delimiter ' ';

I've been to that page, but (as I read them) none sticks out as a sure
thing. I'm not so worried about the actual performance as I am with the
relative throughput (sixes so far).

I'm not cpu bound, but I confess I didn't look at io stats during the
copy runs. I just assume it was pegged :)

Thanks,

In response to

Re: COPY v. java performance comparison at 2014-04-02 19:56:59 from Steve Atkins

Responses

Re: COPY v. java performance comparison at 2014-04-02 20:36:27 from Adrian Klaver
Re: COPY v. java performance comparison at 2014-04-02 20:55:13 from Steve Atkins

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2014-04-02 20:27:56	Re: Bug in user pg_ident.conf mapping code?
Previous Message	Shaun Thomas	2014-04-02 20:14:43	Re: SSD Drives