Lee Kindness writes:
> 1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.
Arguably, a better tool for this is sort(1). For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with
sort -k 1,2 -u INFILE > OUTFILE
To get a record of what duplicates were removed, use diff.
--
Peter Eisentraut peter_e(at)gmx(dot)net