Re: Using COPY to import large xml file

From: Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com>
To: Tim Cross <theophilusx(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Using COPY to import large xml file
Date: 2018-06-26 15:08:05
Message-ID: CANtp6RLsRGKx2p541fWSrUaR5sSkvxf5iTO-HQFumuiN45zYQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks a lot everyone. After playing around with small dataset, I could
able to make datasets that are easy to go with COPY. Creating datasets of
around 50GB took say 2hrs (I can definitely improve on this).

54M records, COPY took around 35 minutes! Awesome.. :) :)

Mean time, I understood few things like vacuum etc.

Really loving postgres!

Thanks,
Anto.

On Tue, Jun 26, 2018 at 3:40 AM, Tim Cross <theophilusx(at)gmail(dot)com> wrote:

>
> Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> writes:
>
> > Thanks a lot. But I do got lot of challenges! Looks like SO data contains
> > lot of tabs within itself.. So tabs delimiter didn't work for me. I
> thought
> > I can give a special demiliter but looks like Postrgesql copy allow only
> > one character as delimiter :(
> >
> > Sad, I guess only way is to insert or do a through serialization of my
> data
> > into something that COPY can understand.
> >
>
> The COPY command has a number of options, including setting what is used
> as the delimiter - it doesn't have to be tab. You need to also look at
> the logs/output to see exactly why the copy fails.
>
> I'd recommend first pre-processing your input data to make sure it is
> 'clean' and all the fields actually match with whatever DDL you have
> used to define your db tables etc. I'd then select a small subset and
> try different parameters to the copy command until you get the right
> combination of data format and copy definition.
>
> It may take some effort to get the right combination, but the result is
> probably worth it given your data set size i.e. difference between hours
> and days.
>
> --
> Tim Cross
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message chiru r 2018-06-26 16:03:26 Re: Schema/Data conversion opensource tools from MySQL to PostgreSQL
Previous Message Enrico Pirozzi 2018-06-26 14:21:26 Re: Problem Postgres