From: | Victor Hooi <victorhooi(at)yahoo(dot)com> |
---|---|
To: | Francisco Olarte <folarte(at)peoplecall(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Breaking up a PostgreSQL COPY command into chunks? |
Date: | 2013-11-08 08:28:29 |
Message-ID: | CAMnnoULRY8gdDboH=Z_cdsxVyRC5C7DuCMdb0nFjm8qzaEHOMw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
@Francisco - Yeah, the file is around 600 Mb currently, uncompressed.
You're right, our internet connection is going to be the limiting factor.
Essentially, the PostgreSQL server is in a datacentre, the server we're
dumping to is in the office.
Running a script on the PostgreSQL server in the datacentre is going to be
tricky (not so much technically, just from a procedures/security point of
view).
Dumping to a spare table seems like an interesting point - so we'd just
create the table, COPY the results to that table, then use LIMIT/OFFSET to
paginate through that, then drop the table afterwards?
Currently, I'm doing a quick hack where we download an ordered list of the
ids (auto-incrementing integer) into Python, chunk it up into groups of
ids, then use a WHERE IN clause to download each chunk via COPY.
Would dumping to a spare table and paginating a better approach? Reasons?
(Not challenging it, I just want to understand everything).
Cheers,
Victor
On Fri, Nov 8, 2013 at 6:36 PM, Francisco Olarte <folarte(at)peoplecall(dot)com>wrote:
> On Fri, Nov 8, 2013 at 5:09 AM, Victor Hooi <victorhooi(at)yahoo(dot)com> wrote:
> > They think that it might be limited by the network, and how fast the
> > PostgreSQL server can push the data across the internet. (The Postgres
> > server and the box running the query are connected over the internet).
>
> You previously said you had 600Mb. Over the internet. ¿ Is it a very
> fat pipe ? Because otherwise the limitng factor is probably not the
> speed at which postgres can push the resuts, but he throughput of your
> link.
>
> If, as you stated, you need a single transaction to get a 600Mb
> snapshot I would recommend to dump it to disk, compressing on the fly
> ( you should get easily four o five fold reduction on a CSV file using
> any decent compressor ), and then send the file. If you do not have
> disk for the dump but can run programs near the server, you can try
> compressing on the fly. If you have got none of this but have got
> space for a spare table, use a select into, paginate this output and
> drop it after. Or just look at the configs and set longer query times,
> if your app NEEDS two hour queries, they can be enabled. But anyway,
> doing a long transaction over the internet does not seem like a good
> idea to me.
>
> Francisco Olarte
>
From | Date | Subject | |
---|---|---|---|
Next Message | si24 | 2013-11-08 09:07:36 | Re: Explanantion on pgbouncer please |
Previous Message | Raphael Bauduin | 2013-11-08 08:03:19 | Re: problem with partitioned table and indexed json field |