Quick Links

Re: COPY Performance

From:	"Hans Zaunere" <lists(at)zaunere(dot)com>
To:	"'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: COPY Performance
Date:	2008-05-05 12:18:07
Message-ID:	007701c8aeaa$13d3cc60$3b7b6520$@com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> > We're using a statement like this to dump between 500K and >5 million
> > rows.
>
> > COPY(SELECT SomeID FROM SomeTable WHERE SomeColumn > '0')
> > TO '/dev/shm/SomeFile.csv'
>
> > Upon first run, this operation can take several minutes. Upon second
> > run, it will be complete in generally well under a minute.
>
> Hmmm ... define "first" versus "second". What do you do to return it
> to the slow state?

Interesting that you ask. I haven't found a very reliable way to reproduce
this.

Typically, just waiting a while to run the same query the second time will
reproduce this behavior. I restarted postgresql and it was reproduced as
well. However, I can't find a way to flush buffers/etc, to reproduce the
problem on-demand. The first time vs second time is usually simply defined
as the temp. CSV file being created slowly once, and then fast, although
frankly sometimes it's always slow, and sometimes seemingly a bit quicker,
depending on the query.

As I mentioned in my other post, the query is always of the simple form
above - single column greater than or equal to a single value (int or single
byte char).

We had set shared_buffers to 1028M, which at first seemed to make a
significant difference. With some testing, various queries ran under 20
seconds, the "first time", and the second time at the same speed or a bit
quicker.

However, operations then restarted the system (and downgraded to Red Hat
5.0) and now we're back where we started. Even with the increased
shared_buffers, things are slow - perhaps I'll try to increase it more.

Upon first starting postgresql, and executing the above statement, the
process begins to grow to 1gb and then stops. When it stops, the CSV file
starts to "trickle" out to disk.

Best,

In response to

Re: COPY Performance at 2008-05-04 23:41:01 from Tom Lane

Responses

Re: COPY Performance at 2008-05-05 15:01:21 from Scott Marlowe

Browse pgsql-general by date

	From	Date	Subject
Next Message	Albe Laurenz	2008-05-05 12:44:02	Re: operator varchar = integer
Previous Message	Hans Zaunere	2008-05-05 12:11:44	Re: COPY Performance