Quick Links

COPY v. java performance comparison

From:	Rob Sargent <robjsargent(at)gmail(dot)com>
To:	PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject:	COPY v. java performance comparison
Date:	2014-04-02 19:37:24
Message-ID:	533C66F4.60409@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I'm playing with various data models to compare performance and
practicalities and not sure if I should be surprised by the numbers I'm
getting. I hope this report isn't too wishy-washy for reasoned comment.

One model says a genotype is defined as follows:

I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+
hours (800+ records/sec). Then I tried COPY and killed that after 11.25
hours when I realised that I had added on non-unque index on the name
fields after the first load. By that point is was on line 28301887, so
~0.75 done which implies it would have take ~15hours to complete.

Would the overhead of the index likely explain this decrease in throughput?

Impatience got the better of me and I killed the second COPY. This time
it had done 54% of the file in 6.75 hours, extrapolating to roughly 12
hours to do the whole thing.

That matches up with the java speed. Not sure if I should be elated with
jOOQ or disappointed with COPY.

Btw, I can load the roughly the same data in to the model below in 10.5
seconds. It only adds 39 very wide lines. I haven't got to the
practicality bits yet :)

This just a dev desktop environment:
RHEL 6.5
PostgreSQL 9.3.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4 [hm, interesting contradiction]), 64-bit
2 QuadCore cpu MHz: 2925.878
The input file and postgres data are on separate disks, but only one
controller.

Thanks in advance, even if you only read this far.

Responses

Re: COPY v. java performance comparison at 2014-04-02 19:56:59 from Steve Atkins
Re: COPY v. java performance comparison at 2014-04-02 22:36:46 from Jeff Janes
Re: COPY v. java performance comparison at 2014-04-03 15:01:36 from Thomas Kellerer

Browse pgsql-general by date

	From	Date	Subject
Next Message	David Johnston	2014-04-02 19:49:44	Re: pg_stat_activity
Previous Message	Bret Stern	2014-04-02 19:37:05	SSD Drives