From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Jim Buttafuoco <jim(at)buttafuoco(dot)net>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bulkloading using COPY - ignore duplicates? |
Date: | 2002-01-02 21:09:36 |
Message-ID: | 200201022109.g02L9aW27520@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Lee Kindness wrote:
> Tom Lane writes:
> > Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
> > > In an ideal world 'COPY FROM' would only be used with data output by
> > > 'COPY TO' and it would be nice and sanitised. However in some fields
> > > this often is not a possibility due to performance constraints!
> > Of course, the more bells and whistles we add to COPY, the slower it
> > will get, which rather defeats the purpose no?
>
> Indeed, but as I've mentioned in this thread in the past, the code
> path for COPY FROM already does a check against the unique index (if
> there is one) but bombs-out rather than handling it...
>
> It wouldn't add any execution time if there were no duplicates in the
> input!
I know many purists object to allowing COPY to discard invalid rows in
COPY input, but it seems we have lots of requests for this feature, with
few workarounds except pre-processing the flat file. Of course, if they
use INSERT, they will get errors that they can just ignore. I don't see
how allowing errors in COPY is any more illegal, except that COPY is one
command while multiple INSERTs are separate commands.
Seems we need to allow such a capability, if only crudely. I don't
think we can create a discard file because of the problem with remote
COPY.
I think we can allow something like:
COPY FROM '/tmp/x' WITH ERRORS 2
meaning we will allow at most two errors and will report the error line
numbers to the user. I think this syntax clearly indicates that errors
are being accepted in the input. An alternate syntax would allow an
unlimited number of errors:
COPY FROM '/tmp/x' WITH ERRORS
The errors can be non-unique errors, or even CHECK constraint errors.
Unless I hear complaints, I will add it to TODO.
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
From | Date | Subject | |
---|---|---|---|
Next Message | Laurette Cisneros | 2002-01-02 21:40:32 | bug in join? |
Previous Message | Hannu Krosing | 2002-01-02 21:09:14 | Re: problems with new vacuum (??) |