From: | Nicolas Paris <niparisco(at)gmail(dot)com> |
---|---|
To: | Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> |
Cc: | Tim Cross <theophilusx(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Using COPY to import large xml file |
Date: | 2018-06-25 16:31:23 |
Message-ID: | CA+ssMOTyW0GrSN3HkS_v9KAqBS4fMxJHxBM1VORUZjF5Qg=cuw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2018-06-25 17:30 GMT+02:00 Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com>:
>
>
> On Mon, Jun 25, 2018 at 8:54 PM, Anto Aravinth <
> anto(dot)aravinth(dot)cse(at)gmail(dot)com> wrote:
>
>>
>>
>> On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris <niparisco(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> 2018-06-25 16:25 GMT+02:00 Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com>:
>>>
>>>> Thanks a lot. But I do got lot of challenges! Looks like SO data
>>>> contains lot of tabs within itself.. So tabs delimiter didn't work for me.
>>>> I thought I can give a special demiliter but looks like Postrgesql copy
>>>> allow only one character as delimiter :(
>>>>
>>>> Sad, I guess only way is to insert or do a through serialization of my
>>>> data into something that COPY can understand.
>>>>
>>>
>>> easiest way would be:
>>> xml -> csv -> \copy
>>>
>>> by csv, I mean regular quoted csv (Simply wrap csv field with double
>>> quote, and escape
>>> enventually contained quotes with an other double quote.).
>>>
>>
>> I tried but no luck. Here is the sample csv, I wrote from my xml
>> convertor:
>>
>> 1 "Are questions about animations or comics inspired by Japanese
>> culture or styles considered on-topic?" "pExamples include a href=""
>> http://www.imdb.com/title/tt0417299/"" rel=""nofollow""Avatar/a, a
>> href=""http://www.imdb.com/title/tt1695360/"" rel=""nofollow""Korra/a
>> and, to some extent, a href=""http://www.imdb.com/title/tt0278238/""
>> rel=""nofollow""Samurai Jack/a. They're all widely popular American
>> cartoons, sometimes even referred to as ema href=""
>> https://en.wikipedia.org/wiki/Anime-influenced_animation""
>> rel=""nofollow""Amerime/a/em./p
>>
>>
>> pAre questions about these series on-topic?/p
>>
>> " "pExamples include a href=""http://www.imdb.com/title/tt0417299/""
>> rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/""
>> rel=""nofollow""Korra/a and, to some extent, a href=""
>> http://www.imdb.com/title/tt0278238/"" rel=""nofollow""Samurai Jack/a.
>> They're all widely popular American cartoons, sometimes even referred to as
>> ema href=""https://en.wikipedia.org/wiki/Anime-influenced_animation""
>> rel=""nofollow""Amerime/a/em./p
>>
>>
>> pAre questions about these series on-topic?/p
>>
>> " "null"
>>
>> the schema of my table is:
>>
>> CREATE TABLE so2 (
>> id INTEGER NOT NULL PRIMARY KEY,
>> title varchar(1000) NULL,
>> posts text,
>> body TSVECTOR,
>> parent_id INTEGER NULL,
>> FOREIGN KEY (parent_id) REFERENCES so1(id)
>> );
>>
>> and when I run:
>>
>> COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';
>>
>>
>> I get:
>>
>>
>> *ERROR: missing data for column "body"*
>
> *CONTEXT: COPY so2, line 1: "1 "Are questions about animations or comics
> inspired by Japanese culture or styles considered on-top..."*
>
>
>> CONTEXT: COPY so2, line 1: "1 "Are questions about animations or comics
>> inspired by Japanese culture or styles considered on-top..."
>>
>> Not sure what I'm missing. Not sure the above csv is breaking because I
>> have newlines within my content. But the error message is very hard to
>> debug.
>>
>>
What you are missing is the configuration of COPY statement (please refer
to https://www.postgresql.org/docs/9.2/static/sql-copy.html)
such format, delimiter, quote and escape
From | Date | Subject | |
---|---|---|---|
Next Message | chiru r | 2018-06-25 16:47:51 | Schema/Data conversion opensource tools from MySQL to PostgreSQL |
Previous Message | Kevin Brannen | 2018-06-25 16:17:11 | RE: Load data from a csv file without using COPY |