From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
---|---|
To: | JPLapham <lapham(at)jandr(dot)org>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Using the database to validate data |
Date: | 2015-07-23 18:02:06 |
Message-ID: | 55B12C1E.9080500@aklaver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 07/23/2015 05:55 AM, JPLapham wrote:
> Hello,
>
> I have an application that occasionally performs large batch inserts of user
> hand-generated data. Input is a tab delimited file with typically hundreds
> to a thousand lines of data.
>
> Because the data is generated by hand, there are always many
> transaction-stopping errors in a typical input run. For example, missing
> datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
> mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
> we chose PostgreSQL exactly because of these problems, because of the robust
> transactional control, rollback on errors, etc.
>
> My question is the following. I would like to *test* the data input for
> integrity in such a way that I can create a report to the user informing
> them of exactly where in their input file to correct the problems.
>
> IDEA 1: My first attempt at this was to simply slurp the data into the
> database, collect the errors, and then rollback. Of course (as I now know),
> this doesn't work because after the first problem, the database reports,
> "current transaction is aborted, commands ignored until end of transaction
> block". This means that I can only report to the user the location of the
> first problem, and then they run the data again, and keep looping through
> the process until the data is good, a huge waste of time.
>
> IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
> This allows me to check for things like NOT NULL and data type issues, but
> not violations of UNIQUE within the new data.
>
> IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
> like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
> is not trivial. It seems ridiculous to do this when I have the *actual*
> database available to test against!
That has already been done:
>
> Has anyone dealt with this kind of issue before? What are your opinions on
> best practice for this? Of course I do not want to actually COMMIT until the
> data is perfect!
>
> Thanks for your time!
> -Jon
>
>
>
> --
> View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-07-23 18:05:52 | Re: Delete rule does not prevent truncate |
Previous Message | Dane Foster | 2015-07-23 18:02:04 | Re: Q: text query search and |