From: | Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> |
---|---|
To: | Tim Cross <theophilusx(at)gmail(dot)com> |
Cc: | Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Using COPY to import large xml file |
Date: | 2018-06-25 01:38:52 |
Message-ID: | CANtp6RLW-UuRr2xQkVt74S_+u3Ad_bVM7npGNzF8BE1YyZwEWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross <theophilusx(at)gmail(dot)com> wrote:
>
> Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> writes:
>
> > Thanks for the response. I'm not sure, how long does this tool takes for
> > the 70GB data.
> >
> > I used node to stream the xml files into inserts.. which was very slow..
> > Actually the xml contains 40 million records, out of which 10Million took
> > around 2 hrs using nodejs. Hence, I thought will use COPY command, as
> > suggested on the internet.
> >
> > Definitely, will try the code and let you know.. But looks like it uses
> the
> > same INSERT, not copy.. interesting if it runs quick on my machine.
> >
> > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat <
> adrien(dot)nayrat(at)anayrat(dot)info>
> > wrote:
> >
> >> On 06/24/2018 05:25 PM, Anto Aravinth wrote:
> >> > Hello Everyone,
> >> >
> >> > I have downloaded the Stackoverflow posts xml (contains all SO
> questions
> >> till
> >> > date).. the file is around 70GB.. I wanna import the data in those xml
> >> to my
> >> > table.. is there a way to do so in postgres?
> >> >
> >> >
> >> > Thanks,
> >> > Anto.
> >>
> >> Hello Anto,
> >>
> >> I used this tool :
> >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres
> >>
>
> If you are using nodejs, then you can easily use the pg-copy-streams
> module to insert the records into your database. I've been using this
> for inserting large numbers of records from NetCDF files. Takes between
> 40 to 50 minutes to insert 60 Million+ records and we are doing
> additional calculations on the values, not just inserting them,
> plus we are inserting into a database over the network and into a database
> which is
> also performing other processing.
>
> We found a significant speed improvement with COPY over blocks of insert
> transactions, which was faster than just individual inserts. The only
> downside with using COPY is that it either completely works or
> completely fails and when it fails, it can be tricky to work out which
> record is causing the failure. A benefit of using blocks of transactions
> is that you have more fine grained control, allowing you to recover from
> some errors or providing more specific detail regarding the cause of the
> error.
>
Sure, let me try that.. I have a question here, COPY usually works when you
move data from files to your postgres instance, right? Now in node.js,
processing the whole file, can I use COPY
programmatically like COPY Stackoverflow <calculated value at run time>?
Because from doc:
https://www.postgresql.org/docs/9.2/static/sql-copy.html
I don't see its possible. May be I need to convert the files to copy
understandable first?
Anto.
>
> Be wary of what indexes your defining on your table. Depending on the
> type and number, these can have significant impact on insert times as
> well.
>
>
> --
> Tim Cross
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tim Cross | 2018-06-25 02:39:16 | Re: Using COPY to import large xml file |
Previous Message | Tim Cross | 2018-06-24 22:14:28 | Re: Using COPY to import large xml file |