From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | mike <mike(at)thegodshalls(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Copy From CSV feature request? |
Date: | 2005-11-08 08:42:39 |
Message-ID: | 437064FF.5020608@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
mike wrote:
> I import some of my data into my postgres database, win32 platform, via
> the COPY table FROM with CSV. My CSV file is created from a Crystal
> Report (v.9). I run the report and have Crystal export the results into
> a CSV file (using the default settings).
>
> I have some data which looks like this when stored in the source
> application (m$ sql server 2000) and Crystal:
>
> Line 1000 1/1/2004 Company2 Person2 Misc
> Line 1001 1/1/2004 Company3 Person3 " Nickname Misc
> Line 1002 1/1/2004 Company3 Person3 " Nickname Misc
> Line 1003 1/1/2004 Company4 Person4 Misc
> Line 1004 1/1/2004 Company5 Person5 Misc
>
> When I export the report as a CSV file Crystal exports it like this:
>
> Line 1000 "1/1/2004","Company2","Person2","Misc"
> Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc"
^^^
This is just wrong. Double-quotes inside double-quoted fields need to be
escaped by doubling. As, e.g. here:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
> Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc"
> Line 1003 "1/1/2004","Company4","Person4","Misc"
> Line 1004 "1/1/2004","Company4","Person4","Misc"
>
> Now when I execute the COPY function it aborts on Line 1002 with the
> error "extra data after last expected column". I am not sure why it
> does not abort on Line 1001.
It's probably seeing the end-of-line as part of the last field on line 1001.
> If Crystal is exporting the data in the correct format should COPY FROM
> CSV be modified to handle an odd number of text qualifiers in a row?
> Does anyone know if this is a valid format (no escape character
> automatically inserted)?
Nope - it's a bad format. Not that there is actually a standard
definition of what CSV is, AFAIK.
> Could COPY be modified to accept the data without having to insert a "
> someplace in the row so that the original data appears the same?
How does PG determine that what you've got is an unescaped " rather than
a missing comma?
> Would it be too much of a performance hit to do this?
>
> How other applications handle it:
>
> Only Excel 2003 seems to display the data correctly.
Well, it silently guesses what you want and gets it right this
particular time.
> If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001
> and Line 1002 into one row.
>
> If I import the data back into Crystal the data after Person3 does not
> appear.
There's a sign that you've got a bug. If Crystal can't read what it
writes then I'm not sure you can expect anyone else to do so reliably.
> If I import it into Access 2003 it ships Line 1001 and Line 1002 into an
> import error table.
Quite right too. It's one of the areas where Access does the right thing :-)
> Worth adding to the TODO or not a good feature?
I'm not a developer, but it strikes me as double plus ungood.
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Sreejesh O S | 2005-11-08 09:22:10 | Is there any other way to compile pgsql without gmake |
Previous Message | Pavel Stehule | 2005-11-08 08:27:03 | Re: Supporting NULL elements in arrays |