Re: error-tolerant COPY FROM

From: Christopher Browne <cbbrowne(at)ca(dot)afilias(dot)info>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: error-tolerant COPY FROM
Date: 2005-02-04 22:35:55
Message-ID: 60zmyk7y2c.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

joolz(at)arbodienst-limburg(dot)nl ("Joolz") writes:

> Hello everyone,
>
> I'm building a postgresql db which will have to get lots of data
> from "the outside" (customers, that is). The db has lots of
> constraints, and I'm sure that our customers will offer lots of
> invalid information. We receive the information in csv format. My
> first thought was to read them into the database with COPY, but
> "COPY stops operation at the first error."
>
> What I need is an import where all valid lines from the csv files
> are read into the db, and I also get a logfile for all invalid
> lines, stating the line number plus the pg error message so I can
> see which constraint was violated.
>
> I can't think of a direct, elegant solution for this, does anyone
> have any suggestions? Thanks a lot!

First, load it into a table that has really, really _weak_
constraints, but which also adds a sequence column at the end that
gets automagically populated.

- The sequence gets you the "line numbers" you want.

- You then run queries that separate the "good" from the "crud."

This might ultimately turn into having 9 tables:

- The original data, that you NEVER touch again, as the "pristine"
form to look at if you find a problem;

- 3 tables that collect entries with 3 different sorts of problems
that mandate discarding the data [well, sending it back for
retreading...]

- 4 tables that contain entries that could get rewritten in some
automatic fashion based on 4 policies you discovered

- 1 "final results" table that aggregates those 4 tables along with
the entries that were OK, which are fine to use as the resulting data
import that can safely go into your application.

I'm making up numbers, but you hopefully get the idea...
--
"cbbrowne","@","ca.afilias.info"
<http://dev6.int.libertyrms.com/>
Christopher Browne
(416) 673-4124 (land)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-02-04 22:44:41 Re: Safely Killing Backends (Was: Applications that leak connections)
Previous Message Alvaro Herrera 2005-02-04 22:25:06 Re: Safely Killing Backends (Was: Applications that leak connections)