Re: Breaking up a PostgreSQL COPY command into chunks?

From: Victor Hooi <victorhooi(at)yahoo(dot)com>
To: Francisco Olarte <folarte(at)peoplecall(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Breaking up a PostgreSQL COPY command into chunks?
Date: 2013-11-08 08:28:29
Message-ID: CAMnnoULRY8gdDboH=Z_cdsxVyRC5C7DuCMdb0nFjm8qzaEHOMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.

You're right, our internet connection is going to be the limiting factor.

Essentially, the PostgreSQL server is in a datacentre, the server we're
dumping to is in the office.

Running a script on the PostgreSQL server in the datacentre is going to be
tricky (not so much technically, just from a procedures/security point of
view).

Dumping to a spare table seems like an interesting point - so we'd just
create the table, COPY the results to that table, then use LIMIT/OFFSET to
paginate through that, then drop the table afterwards?

Currently, I'm doing a quick hack where we download an ordered list of the
ids (auto-incrementing integer) into Python, chunk it up into groups of
ids, then use a WHERE IN clause to download each chunk via COPY.

Would dumping to a spare table and paginating a better approach? Reasons?
(Not challenging it, I just want to understand everything).

Cheers,
Victor

On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte(at)peoplecall(dot)com>wrote:

> On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi(at)yahoo(dot)com> wrote:
> > They think that it might be limited by the network, and how fast the
> > PostgreSQL server can push the data across the internet. (The Postgres
> > server and the box running the query are connected over the internet).
>
> You previously said you had 600Mb. Over the internet. ¿ Is it a very
> fat pipe ? Because otherwise the limitng factor is probably not the
> speed at which postgres can push the resuts, but he throughput of your
> link.
>
> If, as you stated, you need a single transaction to get a 600Mb
> snapshot I would recommend to dump it to disk, compressing on the fly
> ( you should get easily four o five fold reduction on a CSV file using
> any decent compressor ), and then send the file. If you do not have
> disk for the dump but can run programs near the server, you can try
> compressing on the fly. If you have got none of this but have got
> space for a spare table, use a select into, paginate this output and
> drop it after. Or just look at the configs and set longer query times,
> if your app NEEDS two hour queries, they can be enabled. But anyway,
> doing a long transaction over the internet does not seem like a good
> idea to me.
>
> Francisco Olarte
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message si24 2013-11-08 09:07:36 Re: Explanantion on pgbouncer please
Previous Message Raphael Bauduin 2013-11-08 08:03:19 Re: problem with partitioned table and indexed json field