Re: Using the database to validate data

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: JPLapham <lapham(at)jandr(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using the database to validate data
Date: 2015-07-23 18:02:06
Message-ID: 55B12C1E.9080500@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/23/2015 05:55 AM, JPLapham wrote:
> Hello,
>
> I have an application that occasionally performs large batch inserts of user
> hand-generated data. Input is a tab delimited file with typically hundreds
> to a thousand lines of data.
>
> Because the data is generated by hand, there are always many
> transaction-stopping errors in a typical input run. For example, missing
> datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
> mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
> we chose PostgreSQL exactly because of these problems, because of the robust
> transactional control, rollback on errors, etc.
>
> My question is the following. I would like to *test* the data input for
> integrity in such a way that I can create a report to the user informing
> them of exactly where in their input file to correct the problems.
>
> IDEA 1: My first attempt at this was to simply slurp the data into the
> database, collect the errors, and then rollback. Of course (as I now know),
> this doesn't work because after the first problem, the database reports,
> "current transaction is aborted, commands ignored until end of transaction
> block". This means that I can only report to the user the location of the
> first problem, and then they run the data again, and keep looping through
> the process until the data is good, a huge waste of time.
>
> IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
> This allows me to check for things like NOT NULL and data type issues, but
> not violations of UNIQUE within the new data.
>
> IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
> like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
> is not trivial. It seems ridiculous to do this when I have the *actual*
> database available to test against!

That has already been done:

http://pgloader.io/

>
> Has anyone dealt with this kind of issue before? What are your opinions on
> best practice for this? Of course I do not want to actually COMMIT until the
> data is perfect!
>
> Thanks for your time!
> -Jon
>
>
>
> --
> View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2015-07-23 18:05:52 Re: Delete rule does not prevent truncate
Previous Message Dane Foster 2015-07-23 18:02:04 Re: Q: text query search and