From: | Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net> |
---|---|
To: | pgsql-perform <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: COPY vs INSERT |
Date: | 2005-05-03 18:53:17 |
Message-ID: | 1115146397.4277c89dc37b4@webmail.telus.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> Steven Rosenstein <srosenst(at)us(dot)ibm(dot)com> writes:
> > My question is, are there any advantages, drawbacks, or outright
> > restrictions to using multiple simultaneous COPY commands to load
> data into
> > the same table?
Do you mean, multiple COPY commands (connections) being putline'd from
the same thread (process)? I have indirect evidence that this may hurt.
Two copy commands from different threads/processes are fine, and can
help, if they alternate contention on some other resource (disk/CPU).
I'm basing this on being at the third generation of a COPY
implementation. The app loads about 1M objects/hour from 6 servers.
Each object is split across four tables.
The batch load opens four connections and firehoses records down each.
A batch is 10K objects.
COPY invokes all the same logic as INSERT on the server side
(rowexclusive locking, transaction log, updating indexes, rules).
The difference is that all the rows are inserted as a single
transaction. This reduces the number of fsync's on the xlog,
which may be a limiting factor for you. You'll want to crank
WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
One of my streams has 6K records; I run with WB=1000, CS=128.
The downside I found with multiple clients inserting large blocks of
rows was, that they serialized. I THINK that's because at some point
they all needed to lock the same portions of the same indexes. I'm still
working on how to avoid that, tuning the batch size and inserting into a
"queue" table with fewer indexes.
COPY (via putline) didn't do measurably better than INSERT until I
batched 40 newline-separate rows into one putline call, which improved
it 2-3:1. The suspect problem was stalling on the TCP stream; the driver
was flushing small packets. This may or may not be relevant to you;
depends on how much processing (waiting) your app does between posting
of rows.
In such a case, writing alternately to two TCP streams from the same
process increases the likelihood of a stall. I've never tested that
set-up; it would have been heading AWAY from the solution in my case.
Hope that helps.
--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.
From | Date | Subject | |
---|---|---|---|
Next Message | Christopher Petrilli | 2005-05-03 19:17:20 | Re: batch inserts are "slow" |
Previous Message | Steve Wampler | 2005-05-03 18:29:54 | Re: batch inserts are "slow" |