Quick Links

Using the database to validate data

From:	JPLapham <lapham(at)jandr(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Using the database to validate data
Date:	2015-07-23 12:55:52
Message-ID:	1437656152535-5859046.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hello,

I have an application that occasionally performs large batch inserts of user
hand-generated data. Input is a tab delimited file with typically hundreds
to a thousand lines of data.

Because the data is generated by hand, there are always many
transaction-stopping errors in a typical input run. For example, missing
datum in a NOT NULL column, a duplicate value in a UNIQUE column, data type
mismatch, FOREIGN KEY reference to something non-existing, etc. Of course,
we chose PostgreSQL exactly because of these problems, because of the robust
transactional control, rollback on errors, etc.

My question is the following. I would like to *test* the data input for
integrity in such a way that I can create a report to the user informing
them of exactly where in their input file to correct the problems.

IDEA 1: My first attempt at this was to simply slurp the data into the
database, collect the errors, and then rollback. Of course (as I now know),
this doesn't work because after the first problem, the database reports,
"current transaction is aborted, commands ignored until end of transaction
block". This means that I can only report to the user the location of the
first problem, and then they run the data again, and keep looping through
the process until the data is good, a huge waste of time.

IDEA 2: My second idea on how to do this was to ROLLBACK after each INSERT.
This allows me to check for things like NOT NULL and data type issues, but
not violations of UNIQUE within the new data.

IDEA 3: Write my own data pre-conditioner. Ugh, what a nightmare, I feel
like I'm writing my own database! Checking for FKEY constraints, UNIQUE, etc
is not trivial. It seems ridiculous to do this when I have the *actual*
database available to test against!

Has anyone dealt with this kind of issue before? What are your opinions on
best practice for this? Of course I do not want to actually COMMIT until the
data is perfect!

Thanks for your time!
-Jon

--
View this message in context: http://postgresql.nabble.com/Using-the-database-to-validate-data-tp5859046.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Responses

Re: Using the database to validate data at 2015-07-23 18:02:06 from Adrian Klaver
Re: Using the database to validate data at 2015-07-24 12:17:48 from Zdeněk Bělehrádek
Re: Using the database to validate data at 2015-07-24 16:47:16 from Jeff Janes

Browse pgsql-general by date

	From	Date	Subject
Next Message	Kevin Grittner	2015-07-23 15:43:40	Re: Q: text query search and
Previous Message	Devrim GÜNDÜZ	2015-07-23 12:36:49	Re: plv8 package in PostgreSQL 9.4 Yum repositories