Re: Breaking up a PostgreSQL COPY command into chunks?

From: Victor Hooi <victorhooi(at)yahoo(dot)com>
To: wd <wd(at)wdicc(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Breaking up a PostgreSQL COPY command into chunks?
Date: 2013-11-08 04:09:18
Message-ID: CAMnnoU+Re2KygbrpuExdbhmBk_ijuRpe96EMArY2pbU0ozDOtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Aha, I spoke to the somebody, apparently we've actually got those values
set to 15 minutes currently...

They think that it might be limited by the network, and how fast the
PostgreSQL server can push the data across the internet. (The Postgres
server and the box running the query are connected over the internet).

Cheers,
Victor

On Fri, Nov 8, 2013 at 1:44 PM, Victor Hooi <victorhooi(at)yahoo(dot)com> wrote:

> Hi,
>
> Hmm, ok, I'll pass that onto our DBA/operations guys, and see if that
> helps.
>
> Do these settings still work if you only have a single Postgres instance?
> (I'll need to check out setup).
>
> So my understanding is that the default is 30 seconds (
> http://www.postgresql.org/docs/9.3/static/runtime-config-replication.html)
> - but we're increasing it to 600 seconds, and that should give the COPY
> command enough time to pull down the data?
>
> As a rough guide, the dumped CSV file is around 600 Mb.
>
> Is there any other background you might be able to give on what you think
> might be happening, or how this might fix it?
>
> And you'd recommend tweaking these values over trying to chunk up the
> COPY/SELECT, is that right?
>
> I've just realised the LIMIT/ORDER thing may not work well to paginate,
> since there may be new records, or deleted records between each time I call
> it?
>
> Cheers,
> Victor
>
>
> On Fri, Nov 8, 2013 at 1:15 PM, wd <wd(at)wdicc(dot)com> wrote:
>
>> Try this,
>>
>> max_standby_archive_delay = 600s # max delay before canceling queries
>> # when reading WAL from archive;
>> # -1 allows indefinite delay
>> max_standby_streaming_delay = 600s # max delay before canceling queries
>> # when reading streaming WAL;
>> # -1 allows indefinite delay
>>
>> or try
>> pg_xlog_replay_pause()
>> pg_xlog_replay_resume()
>>
>>
>>
>>
>> On Fri, Nov 8, 2013 at 10:06 AM, Victor Hooi <victorhooi(at)yahoo(dot)com>wrote:
>>
>>> Hi,
>>>
>>> We're using psycopg2 with COPY to dump CSV output from a large query.
>>>
>>> The actual SELECT query itself is large (both in number of
>>> records/columns, and also in width of values in columns), but still
>>> completes in around under a minute on the server.
>>>
>>> However, if you then use a COPY with it, it will often time out.
>>>
>>> We're using psycopg2 to run the command, the trace we get is something
>>> like:
>>>
>>> Traceback (most recent call last):
>>> File "foo.py", line 259, in <module>
>>> jobs[job].run_all()
>>> File "foo.py", line 127, in run_all
>>> self.export_to_csv()
>>> File "foo.py", line 168, in export_to_csv
>>> cur.copy_expert(self.export_sql_statement, f)
>>> psycopg2.extensions.TransactionRollbackError: canceling statement due to
>>> conflict with recovery
>>> DETAIL: User was holding shared buffer pin for too long.
>>>
>>> My question is, what are some simple ways we can use to chunk up the
>>> query?
>>>
>>> Could we pull down a list of all the ids (auto-incrementing int), break
>>> this list up, then use a WHERE clause to break it up, running multiple COPY
>>> commands?
>>>
>>> Or would it be better to use LIMIT/OFFSET to break it up? I'm not sure
>>> how we'd figure out when we reached the end of the results set though
>>> (apart from just counting the results?).
>>>
>>> Or are there other approaches you guys could recommend?
>>>
>>> Cheers,
>>> Victor
>>>
>>
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2013-11-08 07:36:00 Re: Breaking up a PostgreSQL COPY command into chunks?
Previous Message Victor Hooi 2013-11-08 02:44:56 Re: Breaking up a PostgreSQL COPY command into chunks?