Quick Links

Re: Breaking up a PostgreSQL COPY command into chunks?

From:	Victor Hooi <victorhooi(at)yahoo(dot)com>
To:	Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Breaking up a PostgreSQL COPY command into chunks?
Date:	2013-11-08 08:28:29
Message-ID:	CAMnnoULRY8gdDboH=Z_cdsxVyRC5C7DuCMdb0nFjm8qzaEHOMw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.

You're right, our internet connection is going to be the limiting factor.

Essentially, the PostgreSQL server is in a datacentre, the server we're
dumping to is in the office.

Running a script on the PostgreSQL server in the datacentre is going to be
tricky (not so much technically, just from a procedures/security point of
view).

Dumping to a spare table seems like an interesting point - so we'd just
create the table, COPY the results to that table, then use LIMIT/OFFSET to
paginate through that, then drop the table afterwards?

Currently, I'm doing a quick hack where we download an ordered list of the
ids (auto-incrementing integer) into Python, chunk it up into groups of
ids, then use a WHERE IN clause to download each chunk via COPY.

Would dumping to a spare table and paginating a better approach? Reasons?
(Not challenging it, I just want to understand everything).

Cheers,
Victor

On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte(at)peoplecall(dot)com>wrote:

> On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi(at)yahoo(dot)com> wrote:
> > They think that it might be limited by the network, and how fast the
> > PostgreSQL server can push the data across the internet. (The Postgres
> > server and the box running the query are connected over the internet).
>
> You previously said you had 600Mb. Over the internet. ¿ Is it a very
> fat pipe ? Because otherwise the limitng factor is probably not the
> speed at which postgres can push the resuts, but he throughput of your
> link.
>
> If, as you stated, you need a single transaction to get a 600Mb
> snapshot I would recommend to dump it to disk, compressing on the fly
> ( you should get easily four o five fold reduction on a CSV file using
> any decent compressor ), and then send the file. If you do not have
> disk for the dump but can run programs near the server, you can try
> compressing on the fly. If you have got none of this but have got
> space for a spare table, use a select into, paginate this output and
> drop it after. Or just look at the configs and set longer query times,
> if your app NEEDS two hour queries, they can be enabled. But anyway,
> doing a long transaction over the internet does not seem like a good
> idea to me.
>
> Francisco Olarte
>

In response to

Re: Breaking up a PostgreSQL COPY command into chunks? at 2013-11-08 07:36:00 from Francisco Olarte

Responses

Re: Breaking up a PostgreSQL COPY command into chunks? at 2013-11-08 11:08:42 from Francisco Olarte

Browse pgsql-general by date

	From	Date	Subject
Next Message	si24	2013-11-08 09:07:36	Re: Explanantion on pgbouncer please
Previous Message	Raphael Bauduin	2013-11-08 08:03:19	Re: problem with partitioned table and indexed json field