Having some problems with concurrent COPY commands

From: Shaun Thomas <bonesmoses(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Having some problems with concurrent COPY commands
Date: 2015-10-12 18:17:53
Message-ID: CAB78C+C9uw4hS+_k2_sR1zgjb1DyL5UonqjmW+WNno3osx=_8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys,

I've been doing some design investigation and ran into an interesting snag
I didn't expect to find on 9.4 (and earlier). I wrote a quick python script
to fork multiple simultaneous COPY commands to several separate tables and
found that performance apparently degrades based on how many COPY commands
are running.

For instance, in the logs with one COPY, I see about one second to import
100k rows. At two processes, it's 2 seconds. At four processes, it's 4
seconds. This is for each process. Thus loading 400k rows takes 16 seconds
cumulatively. To me, it looked like some kind of locking issue, but
pg_locks showed no waits during the load. In trying to figure this out, I
ran across this discussion:

http://www.postgresql.org/message-id/CAB7nPqQJeASxDr0Rt9CJiaf9OnfjoJstyk18iw+oXi-OBO4gYA@mail.gmail.com

Which came after this:

http://forums.enterprisedb.com/posts/list/4048.page

It would appear I'm running into whatever issue the xloginsert_slots patch
tried to address, but not much discussion exists afterwards. It's like the
patch just kinda vanished into the ether even though it (apparently)
massively improves PG's ability to scale data import.

I should note that setting wal_level to minimal, or doing the load on
unlogged tables completely resolves this issue. However, those are not
acceptable settings in a production environment. Is there any other way to
get normal parallel COPY performance, or is that just currently impossible?

I also know 9.5 underwent a lot of locking improvements, so it might not be
relevant. I just haven't gotten a chance to repeat my tests with 9.5 just
yet.

--
Shaun Thomas
bonesmoses(at)gmail(dot)com

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andres Freund 2015-10-12 18:28:08 Re: Having some problems with concurrent COPY commands
Previous Message Mohammad Habbab 2015-10-11 10:58:16 Re: 3000x Slower query when using Foreign Data Wrapper vs. local