Re: Using COPY to import large xml file

From: Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com>
To: Tim Cross <theophilusx(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Using COPY to import large xml file
Date: 2018-06-25 01:38:52
Message-ID: CANtp6RLW-UuRr2xQkVt74S_+u3Ad_bVM7npGNzF8BE1YyZwEWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross <theophilusx(at)gmail(dot)com> wrote:

>
> Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> writes:
>
> > Thanks for the response. I'm not sure, how long does this tool takes for
> > the 70GB data.
> >
> > I used node to stream the xml files into inserts.. which was very slow..
> > Actually the xml contains 40 million records, out of which 10Million took
> > around 2 hrs using nodejs. Hence, I thought will use COPY command, as
> > suggested on the internet.
> >
> > Definitely, will try the code and let you know.. But looks like it uses
> the
> > same INSERT, not copy.. interesting if it runs quick on my machine.
> >
> > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat <
> adrien(dot)nayrat(at)anayrat(dot)info>
> > wrote:
> >
> >> On 06/24/2018 05:25 PM, Anto Aravinth wrote:
> >> > Hello Everyone,
> >> >
> >> > I have downloaded the Stackoverflow posts xml (contains all SO
> questions
> >> till
> >> > date).. the file is around 70GB.. I wanna import the data in those xml
> >> to my
> >> > table.. is there a way to do so in postgres?
> >> >
> >> >
> >> > Thanks,
> >> > Anto.
> >>
> >> Hello Anto,
> >>
> >> I used this tool :
> >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres
> >>
>
> If you are using nodejs, then you can easily use the pg-copy-streams
> module to insert the records into your database. I've been using this
> for inserting large numbers of records from NetCDF files. Takes between
> 40 to 50 minutes to insert 60 Million+ records and we are doing
> additional calculations on the values, not just inserting them,
> plus we are inserting into a database over the network and into a database
> which is
> also performing other processing.
>
> We found a significant speed improvement with COPY over blocks of insert
> transactions, which was faster than just individual inserts. The only
> downside with using COPY is that it either completely works or
> completely fails and when it fails, it can be tricky to work out which
> record is causing the failure. A benefit of using blocks of transactions
> is that you have more fine grained control, allowing you to recover from
> some errors or providing more specific detail regarding the cause of the
> error.
>

Sure, let me try that.. I have a question here, COPY usually works when you
move data from files to your postgres instance, right? Now in node.js,
processing the whole file, can I use COPY
programmatically like COPY Stackoverflow <calculated value at run time>?
Because from doc:

https://www.postgresql.org/docs/9.2/static/sql-copy.html

I don't see its possible. May be I need to convert the files to copy
understandable first?

Anto.

>
> Be wary of what indexes your defining on your table. Depending on the
> type and number, these can have significant impact on insert times as
> well.
>
>
> --
> Tim Cross
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Cross 2018-06-25 02:39:16 Re: Using COPY to import large xml file
Previous Message Tim Cross 2018-06-24 22:14:28 Re: Using COPY to import large xml file