| From: | Hannu Krosing <hannu(at)tm(dot)ee> |
|---|---|
| To: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk> |
| Cc: | Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Bulkloading using COPY - ignore duplicates? |
| Date: | 2001-12-13 13:59:33 |
| Message-ID: | 3C18B445.3C7774D9@tm.ee |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Lee Kindness wrote:
>
> Patrick Welche writes:
> > On Mon, Oct 01, 2001 at 03:17:43PM +0100, Lee Kindness wrote:
> > > Please, don't get me wrong - I don't want to come across arrogant. I'm
> > > simply trying to improve the 'COPY FROM' command in a situation where
> > > speed is a critical issue and the data is dirty... And that must be a
> > > relatively common scenario.
> > Isn't that when you do your bulk copy into into a holding table, then
> > clean it up, and then insert into your live system?
>
> That's what I'm currently doing as a workaround - a SELECT DISTINCT
> from a temporary table into the real table with the unique index on
> it. However this takes absolute ages - say 5 seconds for the copy
> (which is the ballpark figure I aiming toward and can achieve with
> Ingres) plus another 30ish seconds for the SELECT DISTINCT.
>
> The majority of database systems out there handle this situation in
> one manner or another (MySQL ignores or replaces; Ingres ignores;
> Oracle ignores or logs; others...). Indeed PostgreSQL currently checks
> for duplicates in the COPY code but throws an elog(ERROR) rather than
> ignoring the row, or passing the error back up the call chain.
I guess postgresql will be able to do it once savepoints get
implemented.
> My use of PostgreSQL is very time critical, and sadly this issue alone
> may force an evaluation of Oracle's performance in this respect!
Can't you clean the duplicates _outside_ postgresql, say
cat dumpfile | sort | uniq | psql db -c 'copy mytable from stdin'
with your version of uniq.
or perhaps
psql db -c 'copy mytable to stdout' >> dumpfile
sort dumpfile | uniq | psql db -c 'copy mytable from stdin'
if you already have something in mytable.
------------
Hannu
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jason Tishler | 2001-12-13 14:05:28 | Re: Platform Testing - Cygwin |
| Previous Message | Vince Vielhaber | 2001-12-13 13:57:17 | Re: Third call for platform testing |