Re: Having some problems with concurrent COPY commands

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Shaun Thomas <bonesmoses(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Having some problems with concurrent COPY commands
Date: 2015-10-12 18:56:23
Message-ID: CAMkU=1xcg0T-YENLb4h8GfmGXaafn3TyWFyKjMs5YdCDWTPstg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Oct 12, 2015 at 11:17 AM, Shaun Thomas <bonesmoses(at)gmail(dot)com> wrote:

> Hi guys,
>
> I've been doing some design investigation and ran into an interesting snag
> I didn't expect to find on 9.4 (and earlier). I wrote a quick python script
> to fork multiple simultaneous COPY commands to several separate tables and
> found that performance apparently degrades based on how many COPY commands
> are running.
>
> For instance, in the logs with one COPY, I see about one second to import
> 100k rows. At two processes, it's 2 seconds. At four processes, it's 4
> seconds. This is for each process. Thus loading 400k rows takes 16 seconds
> cumulatively. To me, it looked like some kind of locking issue, but
> pg_locks showed no waits during the load. In trying to figure this out, I
> ran across this discussion:
>
>
> http://www.postgresql.org/message-id/CAB7nPqQJeASxDr0Rt9CJiaf9OnfjoJstyk18iw+oXi-OBO4gYA@mail.gmail.com
>
> Which came after this:
>
> http://forums.enterprisedb.com/posts/list/4048.page
>
> It would appear I'm running into whatever issue the xloginsert_slots patch
> tried to address, but not much discussion exists afterwards. It's like the
> patch just kinda vanished into the ether even though it (apparently)
> massively improves PG's ability to scale data import.
>
> I should note that setting wal_level to minimal, or doing the load on
> unlogged tables completely resolves this issue. However, those are not
> acceptable settings in a production environment. Is there any other way to
> get normal parallel COPY performance, or is that just currently impossible?
>
> I also know 9.5 underwent a lot of locking improvements, so it might not
> be relevant. I just haven't gotten a chance to repeat my tests with 9.5
> just yet.
>

Can you provide the test script? Also, have you tuned your database for
high io throughput? What is your storage system like?

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2015-10-12 20:14:26 Re: Having some problems with concurrent COPY commands
Previous Message Andres Freund 2015-10-12 18:28:08 Re: Having some problems with concurrent COPY commands