Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"

From: Arup Rakshit <aruprakshit(at)rocketmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"
Date: 2015-05-24 13:24:56
Message-ID: 1720349.nDRETasvJp@linux-wzza.aruprakshit
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sunday, May 24, 2015 07:24:41 AM you wrote:
> On 05/24/2015 04:55 AM, Arup Rakshit wrote:
> > On Sunday, May 24, 2015 02:52:47 PM you wrote:
> >> On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
> >>> Hi,
> >>>
> >>> I am copying the data from a CSV file to a Table using "COPY" command.
> >>> But one thing that I got stuck, is how to skip duplicate records while
> >>> copying from CSV to tables. By looking at the documentation, it seems,
> >>> Postgresql don't have any inbuilt too to handle this with "copy"
> >>> command. By doing Google I got below 1 idea to use temp table.
> >>>
> >>> http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
> >>>
> >>> I am also thinking what if I let the records get inserted, and then
> >>> delete the duplicate records from table as this post suggested -
> >>> http://www.postgresql.org/message-id/37013500.DFF0A64A@manhattanproject.com.
> >>>
> >>> Both of the solution looks like doing double work. But I am not sure
> >>> which is the best solution here. Can anybody suggest which approach
> >>> should I adopt ? Or if any better ideas you guys have on this task,
> >>> please share.
> >>
> >> Assuming you are using Unix, or can install Unix tools, run the input
> >> files through
> >>
> >> sort -u
> >>
> >> before passing them to COPY.
> >>
> >> Oliver Elphick
> >>
> >
> > I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has.
> >
> > How to do this?
>
> As others have pointed out this depends on what you are considering a
> duplicate.
>
> Is it if the entire row is duplicated?

It is entire row.

> Or if some portion of the row(a 'primary key') is duplicated?
>
> >
> > My SO link is not a solution to my problem I see now.
> >
>
>
>

--
================
Regards,
Arup Rakshit
================
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

--Brian Kernighan

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Arup Rakshit 2015-05-24 14:18:03 Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"
Previous Message rob stone 2015-05-24 13:23:20 Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"