Quick Links

Re: Bulkloading using COPY - ignore duplicates?

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Bulkloading using COPY - ignore duplicates?
Date:	2001-12-13 18:20:18
Message-ID:	Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Lee Kindness writes:

> 1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.

Arguably, a better tool for this is sort(1). For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with

sort -k 1,2 -u INFILE > OUTFILE

To get a record of what duplicates were removed, use diff.

--
Peter Eisentraut peter_e(at)gmx(dot)net

In response to

Bulkloading using COPY - ignore duplicates? at 2001-12-11 16:05:34 from Lee Kindness

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Doug McNaught	2001-12-13 18:20:45	Re: Platform testing (last call?)
Previous Message	Neil Padgett	2001-12-13 18:01:54	Re: Platform testing (last call?)