Re: Creating composite keys from csv

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating composite keys from csv
Date: 2015-03-09 13:36:43
Message-ID: 54FDA1EB.5040200@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 03/08/2015 08:49 PM, Eli Murray wrote:
> Hi all,
>
> I'm a student journalist working on a project for our student paper
> which lists salaries and positions for every staff member at the
> university. We received the data from an FOI request but the university
> is refusing to give us primary keys for the data.
>
> The issue we've run into is that if there are two staff members with the
> same name (and there are) our current web app adds their salaries
> together and considers them one person. Now, luckily, we can create a
> composite key if we combine their name column with their salary column.
> Unfortunately, the format of the data we have makes it more difficult
> than that (of course!) because some employees can hold multiple paying
> positions.
>
> Here's some example data:
>
> Name, Position, Salary,Total Salary, ...
> Jane Doe, Dean, 100.000, 148.000, ...
> John Locke, Custodian, 30.000, 30.000, ...
> Jane Doe, Academic Adviser, 48.000, 148.000, ...
> Jane Doe, Trainer, 46.000, 46.000, ...
>
> Basically, what we'd like to do is create a serial primary key but
> instead of having it increment every row, it needs to check the name and
> total salary columns and only increment if that person doesn't already
> exist. If they do exist, it should just assign the previously created
> number to the column.

Well the above is not going to work, because the id would not be unique
across rows and therefore could not be a primary key. If I am following
what you want is a staff id that identifies a particular staff member
across rows and is derived from the (Name, Total Salary) combination, is
that correct? If so you could use a serial column to generate a
surrogate primary key for each row without worrying about the names and
total salary. Then it becomes an issue of generating the staff id for
unique staff members. For that I would see John McKowns answer.

However, our team is small and between us we have
> very little experience working with databases and we haven't found a way
> to accomplish this goal yet. In fact, we may be trying to solve this in
> the wrong way entirely.
>
> So, to put it succinctly, how would you approach this problem? What are
> our options? Do we need to write a script to clean the data into
> separate csv tables before we import it to postgres, or is this
> something we can do in postgres? We'd really appreciate any help you all
> may be able to offer.
>
> Best!
> Eli Murray
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pinker 2015-03-09 14:08:37 Re: VACUUM FULL doesn't reduce table size
Previous Message Adrian Klaver 2015-03-09 13:17:02 Re: pg_conndefaults Returning empty string