From: | Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com> |
---|---|
To: | Nicolas Paris <niparisco(at)gmail(dot)com> |
Cc: | Tim Cross <theophilusx(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)anayrat(dot)info>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Using COPY to import large xml file |
Date: | 2018-06-25 15:24:28 |
Message-ID: | CANtp6RJqKeFgAzPOY1sb-Gbz451JSi9CKEvD31yJvnGKXrDh9A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris <niparisco(at)gmail(dot)com> wrote:
>
> 2018-06-25 16:25 GMT+02:00 Anto Aravinth <anto(dot)aravinth(dot)cse(at)gmail(dot)com>:
>
>> Thanks a lot. But I do got lot of challenges! Looks like SO data contains
>> lot of tabs within itself.. So tabs delimiter didn't work for me. I thought
>> I can give a special demiliter but looks like Postrgesql copy allow only
>> one character as delimiter :(
>>
>> Sad, I guess only way is to insert or do a through serialization of my
>> data into something that COPY can understand.
>>
>
> easiest way would be:
> xml -> csv -> \copy
>
> by csv, I mean regular quoted csv (Simply wrap csv field with double
> quote, and escape
> enventually contained quotes with an other double quote.).
>
I tried but no luck. Here is the sample csv, I wrote from my xml convertor:
1 "Are questions about animations or comics inspired by Japanese
culture or styles considered on-topic?" "pExamples include a href=""
http://www.imdb.com/title/tt0417299/"" rel=""nofollow""Avatar/a, a href=""
http://www.imdb.com/title/tt1695360/"" rel=""nofollow""Korra/a and, to some
extent, a href=""http://www.imdb.com/title/tt0278238/""
rel=""nofollow""Samurai Jack/a. They're all widely popular American
cartoons, sometimes even referred to as ema href=""
https://en.wikipedia.org/wiki/Anime-influenced_animation""
rel=""nofollow""Amerime/a/em./p
pAre questions about these series on-topic?/p
" "pExamples include a href=""http://www.imdb.com/title/tt0417299/""
rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/""
rel=""nofollow""Korra/a and, to some extent, a href=""
http://www.imdb.com/title/tt0278238/"" rel=""nofollow""Samurai Jack/a.
They're all widely popular American cartoons, sometimes even referred to as
ema href=""https://en.wikipedia.org/wiki/Anime-influenced_animation""
rel=""nofollow""Amerime/a/em./p
pAre questions about these series on-topic?/p
" "null"
the schema of my table is:
CREATE TABLE so2 (
id INTEGER NOT NULL PRIMARY KEY,
title varchar(1000) NULL,
posts text,
body TSVECTOR,
parent_id INTEGER NULL,
FOREIGN KEY (parent_id) REFERENCES so1(id)
);
and when I run:
COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';
I get:
CONTEXT: COPY so2, line 1: "1 "Are questions about animations or comics
inspired by Japanese culture or styles considered on-top..."
Not sure what I'm missing. Not sure the above csv is breaking because I
have newlines within my content. But the error message is very hard to
debug.
>
> Postgresql copy csv parser is one of the most robust I ever tested
> before.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Anto Aravinth | 2018-06-25 15:30:47 | Re: Using COPY to import large xml file |
Previous Message | Nicolas Paris | 2018-06-25 14:50:16 | Re: Using COPY to import large xml file |