Re: import CSV file to a table

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: import CSV file to a table
Date: 2017-03-08 16:24:19
Message-ID: 7858a99b-0ae4-4d0d-c2cc-56571f00d51f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Since bash has been bandied about in this thread I presume awk is
available. Here's how I would check just how 'csv'ish the incoming file is.

awk -F"," '{a[$NF]++}END{for(i in a){printf "%d lines have %d
fields(columns)\n", a[i], i}}' csvfilename

If this doesn't produce one line you have to suspect quoted values
including commas (or what ever char you choose). then you need a real
csv parser.

If just one line, I'ld use cut to get rid of unwanted columns, then let
COPY do it's thing

On 03/08/2017 09:13 AM, Karl Czajkowski wrote:
> On Mar 08, John McKown modulated:
> ...
>> ​I agree. I went with a "pure BASH" approach because it is what the
>> user asked for & I wasn't sure what language she might be comfortable
>> with. I use PERL a lot. Or maybe I should say that I abuse PERL a lot.
>> Such as a PERL script with writes out another PERL script, based on
>> some input files & parameters, then runs the just written PERL script,
>> which does the load into a PostgreSQL database (multiple tables). Ya, a
>> bit perverted.​
>>
> Well, you could follow a similar meta-programming/code-generating
> pattern to have the BASH script output a single SQL file to run with
> psql. You could even generate PL/pgsql code to defer more data
> processing to the database itself.
>
> I think the only robust "pure BASH" approach is to use a temporary
> table, so you aren't trying to parse CSV content in BASH. Using csvkit
> sounds good if you can introduce these third-party dependencies.
>
> With the temporary table, you can use SQL for most validation or data
> interrogation, but you need to know at least enough schema information
> in advance to form the COPY statement. Parsing the CSV header row to
> plan your work puts you right back to requiring a robust CSV parser
> unless you can constrain your input scenarios to only handle very
> trivial headers.
>
> If you play games with a defaulting serial column and fixed column
> names like "id, c1, c2, ..., cN" for the temporary table, you might
> use the id column as a DB-assigned "row number" during COPY and
> validation. In this case, you could even tell Postgres there is no
> header, and then let it parse the header as another data record so you
> can use SQL statements to determine the actual header names and
> ordering in the input. But this still requires knowing the column
> count in advance of the COPY.
>
> I also think using something like Python with structured data
> processing would be wiser, unless you know enough about the schema in
> advance to avoid any CSV parsing on the client side.
>
>
> Karl
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Karl Czajkowski 2017-03-08 16:36:25 Re: import CSV file to a table
Previous Message David G. Johnston 2017-03-08 16:21:16 Re: import CSV file to a table