| From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
|---|---|
| To: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk> |
| Cc: | <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Bulkloading using COPY - ignore duplicates? |
| Date: | 2001-12-13 18:20:18 |
| Message-ID: | Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Lee Kindness writes:
> 1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.
Arguably, a better tool for this is sort(1). For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with
sort -k 1,2 -u INFILE > OUTFILE
To get a record of what duplicates were removed, use diff.
--
Peter Eisentraut peter_e(at)gmx(dot)net
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Doug McNaught | 2001-12-13 18:20:45 | Re: Platform testing (last call?) |
| Previous Message | Neil Padgett | 2001-12-13 18:01:54 | Re: Platform testing (last call?) |