Re: Different length lines in COPY CSV

From: "Mike G(dot)" <mike(at)thegodshalls(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgman(at)candle(dot)pha(dot)pa(dot)us, chriskl(at)familyhealth(dot)com(dot)au, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Different length lines in COPY CSV
Date: 2005-12-12 19:14:28
Message-ID: 20051212191428.GA13700@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I too have to deal with this issue daily since I deal with Windows daily. It sounds like the source of the csv files were originally Excel files. I have never been able to figure out how Excel determines to quit putting null values in empty columns after X rows and resume again later on.

If the file has less than 65000 rows I would suggest using OpenOffice 2.0 instead of Excel. OpenOffice does not stop filling the empty columns and with 2.0 it now supports the same maximum number of rows that Excel does.

I use Perl constantly to "reformat" files and import them as a csv using the COPY command. I think the original poster would prefer a php solution though...

While it is not a problem for me I do have other less technical users who don't know perl and this makes postgres much more difficult for them to use. Most of them come from a M$ Access background which can handle importing of Excel files directly thus don't have to deal with this issue.

A file conversion utility would be very helpful for supporting Postgres with Windows especially if it could handle Excel files in their native format.

Mike

On Mon, Dec 12, 2005 at 07:58:52PM +0100, Martijn van Oosterhout wrote:
> On Mon, Dec 12, 2005 at 10:15:03AM -0500, Pollard, Mike wrote:
> > Tom Lane wrote:
> > > What's been suggested in the past is some sort of standalone
> > > file-format-conversion utility, which could deal with this sort of
> > > stuff without having to also deal with all the backend-internal
> > > considerations that COPY must handle. So (at least in theory) it'd
> > > be simpler and more maintainable. That still seems like a good
> > > idea to me --- in fact, given my druthers I would rather have seen
> > > CSV support done in such an external program.
> >
> > Why not add hooks into COPY to call the user's massage functions? That
> > way you don't have to read and write the data, then read it again to
> > load it into the database.
>
> Well, it does make you wonder about supporting something like (perl
> style):
>
> \copy foo FROM 'myfilter dodgy-data.csv |'
>
> or maybe
>
> \copy foo from pipe 'myfilter dodgy-data.csv'
>
> or possibly
>
> \pipe foo from dodgy-data.csv using autodetecting-format-filter.pl
>
> Which would cause psql to fork/exec the filter and pass the output data
> to the server. We could then provide all sorts of parsers for
> format-of-the-week. This would probably include the converse:
>
> \pipe foo to table.xls using make-excel-spreadsheet.pl
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike G. 2005-12-12 19:30:54 PgInstaller error on upgrade
Previous Message Martijn van Oosterhout 2005-12-12 18:58:52 Re: Different length lines in COPY CSV