From: | "Hans Zaunere" <lists(at)zaunere(dot)com> |
---|---|
To: | <pgsql-general(at)postgresql(dot)org> |
Subject: | COPY Performance |
Date: | 2008-05-04 23:11:35 |
Message-ID: | 00db01c8ae3c$32c76be0$985643a0$@com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
We're using a statement like this to dump between 500K and >5 million rows.
COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
TO '/dev/shm/SomeFile.csv'
Upon first run, this operation can take several minutes. Upon second run,
it will be complete in generally well under a minute.
Some observations/thoughts:
-- When watching the file grow, it will "trickle" upon first run, increasing
in size by only a few kb/s. Upon second run, it will grow by megabytes per
second.
-- If we dump to disk, the same behavior is observed - no apparent RAM
contention issues.
-- The SELECT by itself will complete typically in a few seconds, either on
second or first run.
-- Upon the first run, the PostgreSQL process typically consumes <10% CPU.
-- In very rare cases, we've seen even the first run be as fast as the
second run.
-- There is no other activity on the server while performing these tests.
-- SomeID is just an int
-- CPU usage (wait %) is low; seems as though there is some other bottleneck
Any thoughts on:
-- tuning parameters to speed up the initial execution
-- why is it so slow to begin with?
-- possible areas to check for bottlenecks?
-- better method for accomplishing the same thing?
Thank you,
H
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-05-04 23:41:01 | Re: COPY Performance |
Previous Message | Greg Smith | 2008-05-04 23:03:29 | Re: Interesting comments about fsync on Linux |