Re: "Ungroup" data for import into PostgreSQL

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, George Weaver <gweaver(at)shaw(dot)ca>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: "Ungroup" data for import into PostgreSQL
Date: 2015-01-16 01:02:06
Message-ID: 54B8630E.1020203@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01/15/2015 04:56 PM, Jim Nasby wrote:
> On 1/15/15 9:43 AM, George Weaver wrote:
>> Hi List,
>>
>> I need to import data from a large Excel spreadsheet into a PostgreSQL
>> table. I have a program that uses ODBC to connect to Excel and
>> extract data using SQL queries. The program then inserts the data
>> into a PostgreSQL table.
>>
>> The challenge with this particular spreadsheet is that it is arrayed
>> thus:
>>
>> Model No 1 Product Code 15
>> Serial No 1 No on Hand
>> Serial No 2 No on Hand
>> Serial No 3 No on Hand
>> Model No 4 Product Code 9
>> Serial No 12 No on Hand
>> Model No 5 Product Code 27
>> Serial No 6 No on Hand
>> Serial No 14 No on Hand
>>
>> etc.
>>
>> I need the data in PostgreSQL arrayed thus
>>
>> Model No 1 Product Code 15 Serial No 1 No on Hand
>> Model No 1 Product Code 15 Serial No 2 No on Hand
>> Model No 1 Product Code 15 Serial No 3 No on Hand
>> Model No 4 Product Code 9 Serial No 12 No on Hand
>> Model No 5 Product Code 27 Serial No 6 No on Hand
>> Model No 5 Product Code 27 Serial No 14 No on Hand
>>
>> I can import the data procedurely using plpgsql to match the
>> individual rows to the master for each row (import the raw data into a
>> buffer table in PostgreSQL and then looping through the rows in the
>> buffer table and checking to see when the Model No changes).
>
> Note that if you're doing that you better be putting the rownumber from
> excel into the table... result sets are NOT guaranteed to be in insert
> order!
>
>> I'm wondering if there is a more elegant way to do this using straight
>> sql from Excel?
>
> Well, that's really an excel question, not a Postgres question...
>
> If you load the whole spreadsheet into a single table and have a way to
> differentiate between the different rows then you might be able to do
> something with CTE's to relate a serial number to the product code. That
> might be faster than plpgsql.
>
> You might also be able to do something creative with formulas in excel
> to copy the product code data to the serial # rows. You could then
> import the whole thing and re-normalize it.
>
> There's probably some stuff you could do with VBA too. If you care about
> performance you don't want to execute SQL statements for each
> spreadsheet row.

Or if you really want to slice and dice and you use Python, then take a
look at Pandas:

http://pandas.pydata.org/

In particular the IO functions:

http://pandas.pydata.org/pandas-docs/stable/io.html

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fabio Ugo Venchiarutti 2015-01-16 01:12:13 Proper use of pg_xlog_location_diff()
Previous Message Jim Nasby 2015-01-16 00:56:00 Re: "Ungroup" data for import into PostgreSQL