Quick Links

Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"

From:	Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To:	Arup Rakshit <aruprakshit(at)rocketmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY"
Date:	2015-05-24 14:52:43
Message-ID:	5561E5BB.4030408@aklaver.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On 05/24/2015 06:24 AM, Arup Rakshit wrote:
> On Sunday, May 24, 2015 07:24:41 AM you wrote:
>> On 05/24/2015 04:55 AM, Arup Rakshit wrote:
>>> On Sunday, May 24, 2015 02:52:47 PM you wrote:
>>>> On Sun, 2015-05-24 at 16:56 +0630, Arup Rakshit wrote:
>>>>> Hi,
>>>>>
>>>>> I am copying the data from a CSV file to a Table using "COPY" command.
>>>>> But one thing that I got stuck, is how to skip duplicate records while
>>>>> copying from CSV to tables. By looking at the documentation, it seems,
>>>>> Postgresql don't have any inbuilt too to handle this with "copy"
>>>>> command. By doing Google I got below 1 idea to use temp table.
>>>>>
>>>>> http://stackoverflow.com/questions/13947327/to-ignore-duplicate-keys-during-copy-from-in-postgresql
>>>>>
>>>>> I am also thinking what if I let the records get inserted, and then
>>>>> delete the duplicate records from table as this post suggested -
>>>>> http://www.postgresql.org/message-id/37013500.DFF0A64A@manhattanproject.com.
>>>>>
>>>>> Both of the solution looks like doing double work. But I am not sure
>>>>> which is the best solution here. Can anybody suggest which approach
>>>>> should I adopt ? Or if any better ideas you guys have on this task,
>>>>> please share.
>>>>
>>>> Assuming you are using Unix, or can install Unix tools, run the input
>>>> files through
>>>>
>>>> sort -u
>>>>
>>>> before passing them to COPY.
>>>>
>>>> Oliver Elphick
>>>>
>>>
>>> I think I need to ask more specific way. I have a table say `table1`, where I feed data from different CSV files. Now suppose I have inserted N records to my table `table1` from csv file `c1`. This is ok, next time when again I am importing from a different CSV file say `c2` to `table1`, I just don't want reinsert any record from this new CSV file to table `table1`, if the current CSV data already table has.
>>>
>>> How to do this?
>>
>> As others have pointed out this depends on what you are considering a
>> duplicate.
>>
>> Is it if the entire row is duplicated?
>
> It is entire row.

So, Olivers second solution.

>
>> Or if some portion of the row(a 'primary key') is duplicated?
>>
>>>
>>> My SO link is not a solution to my problem I see now.
>>>
>>
>>
>>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY" at 2015-05-24 13:24:56 from Arup Rakshit

Responses

Re: How to skip duplicate records while copying from CSV to table in Postgresql using "COPY" at 2015-05-24 14:18:03 from Arup Rakshit

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2015-05-24 15:00:04	Re: PG and undo logging
Previous Message	Peter Swartz	2015-05-24 14:27:22	Re: Enum in foreign table: error and correct way to handle.