From: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Bulkloading using COPY - ignore duplicates? |
Date: | 2001-10-01 13:54:25 |
Message-ID: | 15288.30097.74590.206271@elsick.csl.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane writes:
> Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
> > Would this seem a reasonable thing to do? Does anyone rely on COPY
> > FROM causing an ERROR on duplicate input?
> Yes. This change will not be acceptable unless it's made an optional
> (and not default, IMHO, though perhaps that's negotiable) feature of
> COPY.
I see where you're coming from, but seriously what's the use/point of
COPY aborting and doing a rollback if one duplicate key is found? I
think it's quite reasonable to presume the input to COPY has had as
little processing done on it as possible. I could loop through the
input file before sending it to COPY but that's just wasting cycles
and effort - Postgres has btree lookup built in, I don't want to roll
my own before giving Postgres my input file!
> The implementation might be rather messy too. I don't much care
> for the notion of a routine as low-level as bt_check_unique knowing
> that the context is or is not COPY. We might have to do some
> restructuring.
Well in reality it wouldn't be "you're getting run from copy" but
rather "notice on duplicate, rather than error & exit". There is a
telling comment in nbtinsert.c just before _bt_check_unique() is
called:
/*
* If we're not allowing duplicates, make sure the key isn't already
* in the index. XXX this belongs somewhere else, likely
*/
So perhaps dupes should be searched for before _bt_doinsert is called,
or somewhere more appropriate?
> > Would:
> > WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)
> > need to be added to the COPY command (I hope not)?
> It occurs to me that skip-the-insert might be a useful option for
> INSERTs that detect a unique-key conflict, not only for COPY. (Cf.
> the regular discussions we see on whether to do INSERT first or
> UPDATE first when the key might already exist.) Maybe a SET variable
> that applies to all forms of insertion would be appropriate.
That makes quite a bit of sense.
--
Lee Kindness, Senior Software Engineer
Concept Systems Limited.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2001-10-01 14:02:54 | Re: Bulkloading using COPY - ignore duplicates? |
Previous Message | Justin Clift | 2001-10-01 13:48:19 | Re: Spinlock performance improvement proposal |