Re: Having some problems with concurrent COPY commands

From: Shaun Thomas <bonesmoses(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Having some problems with concurrent COPY commands
Date: 2015-10-12 20:14:26
Message-ID: CAB78C+D=4CyZ-33kLvvZMenQWtkNEsmvkMKRPSV3AG2edchgbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Oct 12, 2015 at 1:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Any chance
> you could provide profiles of such a run?

This is as simple as I could make it reliably. With one copy running,
the thread finishes in about 1 second. With 2, it's 1.5s each, and
with all 4, it's a little over 3s for each according to the logs. I
have log_min_duration_statement set to 1000, so it's pretty obvious.
The scary part is that it's not even scaling linearly; performance is
actually getting *worse* with each subsequent thread.

Regarding performance, all of this fits in memory. The tables are only
100k rows with the COPY statement. The machine itself is 8 CPUs with
32GB of RAM, so it's not an issue of hardware. So far as I can tell,
it happens on every version I've tested on, from 9.2 to 9.4. I also
take back what I said about wal_level. Setting it to minimal does
nothing. Disabling archive_mode and setting max_wal_senders to 0 also
does nothing. With 4 concurrent processes, each takes 3 seconds, for a
total of 12 seconds to import 400k rows when it would take 4 seconds
to do sequentially. Sketchy.

COPY (
SELECT id, id % 100, id % 1000, now() - (id || 's')::INTERVAL
FROM generate_series(1, 100000) a(id)
) TO '/tmp/loadtest1.csv';

CREATE TABLE test_copy (
id SERIAL PRIMARY KEY,
location VARCHAR NOT NULL,
reading BIGINT NOT NULL,
reading_date TIMESTAMP NOT NULL
);

CREATE INDEX idx_test_copy_location ON test_copy (location);
CREATE INDEX idx_test_copy_date ON test_copy (reading_date);

CREATE TABLE test_copy2 (LIKE test_copy INCLUDING INDEXES);
CREATE TABLE test_copy3 (LIKE test_copy INCLUDING INDEXES);
CREATE TABLE test_copy4 (LIKE test_copy INCLUDING INDEXES);

psql -c "COPY test_copy FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy2 FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy3 FROM '/tmp/loadtest1.csv'" &>/dev/null &
psql -c "COPY test_copy4 FROM '/tmp/loadtest1.csv'" &>/dev/null &

--
Shaun Thomas
bonesmoses(at)gmail(dot)com
http://bonesmoses.org/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Graeme B. Bell 2015-10-13 08:47:09 V8 optimisation (if you're using javascript in postgres)
Previous Message Jeff Janes 2015-10-12 18:56:23 Re: Having some problems with concurrent COPY commands