Re: Copy From CSV feature request?

From: Richard Huxton <dev(at)archonet(dot)com>
To: mike <mike(at)thegodshalls(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Copy From CSV feature request?
Date: 2005-11-08 08:42:39
Message-ID: 437064FF.5020608@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

mike wrote:
> I import some of my data into my postgres database, win32 platform, via
> the COPY table FROM with CSV. My CSV file is created from a Crystal
> Report (v.9). I run the report and have Crystal export the results into
> a CSV file (using the default settings).
>
> I have some data which looks like this when stored in the source
> application (m$ sql server 2000) and Crystal:
>
> Line 1000 1/1/2004 Company2 Person2 Misc
> Line 1001 1/1/2004 Company3 Person3 " Nickname Misc
> Line 1002 1/1/2004 Company3 Person3 " Nickname Misc
> Line 1003 1/1/2004 Company4 Person4 Misc
> Line 1004 1/1/2004 Company5 Person5 Misc
>
> When I export the report as a CSV file Crystal exports it like this:
>
> Line 1000 "1/1/2004","Company2","Person2","Misc"
> Line 1001 "1/1/2004","Company3","Person3 " Nickname","Misc"
^^^
This is just wrong. Double-quotes inside double-quoted fields need to be
escaped by doubling. As, e.g. here:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

> Line 1002 "1/1/2004","Company3","Person3 " Nickname","Misc"
> Line 1003 "1/1/2004","Company4","Person4","Misc"
> Line 1004 "1/1/2004","Company4","Person4","Misc"
>
> Now when I execute the COPY function it aborts on Line 1002 with the
> error "extra data after last expected column". I am not sure why it
> does not abort on Line 1001.

It's probably seeing the end-of-line as part of the last field on line 1001.

> If Crystal is exporting the data in the correct format should COPY FROM
> CSV be modified to handle an odd number of text qualifiers in a row?
> Does anyone know if this is a valid format (no escape character
> automatically inserted)?

Nope - it's a bad format. Not that there is actually a standard
definition of what CSV is, AFAIK.

> Could COPY be modified to accept the data without having to insert a "
> someplace in the row so that the original data appears the same?

How does PG determine that what you've got is an unescaped " rather than
a missing comma?

> Would it be too much of a performance hit to do this?
>
> How other applications handle it:
>
> Only Excel 2003 seems to display the data correctly.

Well, it silently guesses what you want and gets it right this
particular time.

> If I open the CSV file using OpenOffice Calc 2.0 it combines Line 1001
> and Line 1002 into one row.
>
> If I import the data back into Crystal the data after Person3 does not
> appear.

There's a sign that you've got a bug. If Crystal can't read what it
writes then I'm not sure you can expect anyone else to do so reliably.

> If I import it into Access 2003 it ships Line 1001 and Line 1002 into an
> import error table.

Quite right too. It's one of the areas where Access does the right thing :-)

> Worth adding to the TODO or not a good feature?

I'm not a developer, but it strikes me as double plus ungood.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sreejesh O S 2005-11-08 09:22:10 Is there any other way to compile pgsql without gmake
Previous Message Pavel Stehule 2005-11-08 08:27:03 Re: Supporting NULL elements in arrays