From: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, Jim Buttafuoco <jim(at)buttafuoco(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bulkloading using COPY - ignore duplicates? |
Date: | 2001-12-18 10:09:14 |
Message-ID: | 15391.5578.336203.295826@elsick.csl.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Eisentraut writes:
> Lee Kindness writes:
> > Consider SELECT DISTINCT - which is the 'duplicate' and which one is
> > the good one?
> It's not the same thing. SELECT DISTINCT only eliminates rows that are
> completely the same, not only equal in their unique contraints.
> Maybe you're thinking of SELECT DISTINCT ON (). Observe the big warning
> that the result of that statement are random unless ORDER BY is used. --
> But that's not the same thing either. We've never claimed that the COPY
> input has an ordering assumption. In fact you're asking for a bit more
> than an ordering assumption, you're saying that the earlier data is better
> than the later data. I think in a random use case that is more likely
> *not* to be the case because the data at the end is newer.
You're right - I was meaning 'SELECT DISTINCT ON ()'. However I'm only
using it as an example of where the database is choosing (be it
randomly) the data to discarded. While I've said in this thread that
'COPY FROM IGNORE DUPLICATES' would ignore later duplicates I'm not
really that concerned about what it ignores; first, later, random,
... I agree if it was of concern then it should be pre-processed.
> Btw., here's another concern about this proposed feature: If I do
> a client-side COPY, how will you sent the "ignored" rows back to
> the client?
Again a number of different ideas have been mixed up in the
discussion. Oracle's logging option was only given as an example of
how other database systems deal with this option - If it wasn't
explicitly given then it's reasonable to discard the extra
information.
What really would be nice in the SQL-world is a standardised COPY
statement...
Best regards, Lee Kindness.
From | Date | Subject | |
---|---|---|---|
Next Message | Jayaraj Oorath | 2001-12-18 10:38:15 | Scheduling Jobs in Postgres |
Previous Message | Christoph Haller | 2001-12-18 09:05:58 | Re: ODBC on OSX |