Re: [GENERAL] copy command -- foiled by pg_atoi

From: Ed Loehr <ELOEHR(at)austin(dot)rr(dot)com>
To: Thomas Reinke <reinke(at)e-softinc(dot)com>
Cc: Jose Soares <jose(at)sferacarta(dot)com>, Mike Beller <beller(at)tradeworx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] copy command -- foiled by pg_atoi
Date: 1999-12-20 20:47:21
Message-ID: 385E95D9.CAD4F50D@austin.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have found that judicious placement of a few queries (selects, intentional
errors, etc.) within a long sequence of inserts will help segment them for
identification of offending lines. Hokie, but it helps me.

Cheers,
Ed Loehr

Thomas Reinke wrote:

> I've run into this a few times as well. My strategy to "hunt" down
> the offending line has been to do a "bisection" algorithm.
>
> Essentially, cut your file in half. If the first half loads ok,
> you know the problem is in the second half, and vice versa.
> Now cut the offending half in half again. Do the same.
> With 23,000 rows, you will do this sequence no more than 15
> times, and you will have narrowed down the offender (assuming
> 23,000 rows). With 2 million rows (something we've had to contend
> with in the past), you'll nail it down in 21 iterations.
>
> Not pretty to say the least, but workable.
>
> Cheers, Thomas
>
> Jose Soares wrote:
> >
> > This is also my problem. I'm getting '@!?àù§èé+*_|!&/%§¸' to load a
> > table with more than 23,000 rows
> > because I don't know in which line I have to look for the the error.
> > It's obvious that psql reads the input file line by line, and of course
> > psql knows in wich line the error
> > occurred. This information would be very important to correct any syntax
> > error in the file.
> >
> > Comments?
> >
> > Jose'
> >
> > Mike Beller wrote:
> > >
> > > Hi
> > >
> > > I'm using postgresql on RedHat/i386 from the RH6.1 RPMs
> > > (Linux 2.2.5-15 PostgreSQL-6.5.2). I've been evaluating it,
> > > and am quite excited about the possibilty of using this outstanding
> > > open-source code in my company's systems.
> > >
> > > I noticed the following and want to submit it for your consideration:
> > >
> > > When doing a 'copy' from a big text file, if there is a record, say
> > > half way down, which has an unparseable integer field (say 'X'
> > > in a column that shoudl be an integer) one gets the following error:
> > >
> > > ERROR: pg_atoi: error in "X": can't parse "X"
> > >
> > > However, I don't get a line number reference where the error
> > > occurred. This is a problem when you have half a million lines
> > > in your file, some of which may contain legitimate X's as
> > > well as non-legit ones. I don't even know which attribute failed.
> > >
> > > Simple example: (imagine a million rows with 12 columns and
> > > you're trying to find the problem:)
> > >
> > > -------
> > > create table foo (x int);
> > > copy foo from stdin;
> > > 1
> > > 2
> > > 3
> > > X
> > > 4
> > > \.
> > > -----------
> > >
> > > I noticed that in pg_atoi there is this code:
> > >
> > > if (errno) /* strtol must set ERANGE */
> > > elog(ERROR, "pg_atoi: error reading \"%s\": %m", s);
> > > if (badp && *badp && (*badp != c))
> > > elog(ERROR, "pg_atoi: error in \"%s\": can\'t parse
> > > \"%s\"",s, badp);
> > >
> > > This is called (via int4in or some such) from copy.c:
> > >
> > > values[i] = (Datum) (*fmgr_faddr(&in_functions[i]))(string,
> > > elements[i],typmod[i]);
> > > /*
> > > * Sanity check - by reference attributes cannot
> > > * return NULL
> > > */
> > > if (!PointerIsValid(values[i]) &&
> > > !(rel->rd_att->attrs[i]->attbyval))
> > > elog(ERROR, "copy from line %d: Bad file format",
> > > lineno);
> > >
> > > So the lineno information is there, it's just not available when pg_atoi
> > > logs its error and bails out.
> > >
> > > Is it possible for pg_atoi to return an invalid pointer instead
> > > of completely bailing out, thus allowing the 'sanity check' code
> > > to print the line number? I have no idea how many things this
> > > might break...
> > >
> > > Regards
> > >
> > > Mike Beller
> > > CTO
> > > Tradeworx.com
> > >
> > > ************
> >
> > ************
>
> --
> ------------------------------------------------------------
> Thomas Reinke Tel: (905) 331-2260
> Director of Technology Fax: (905) 331-2504
> E-Soft Inc. http://www.e-softinc.com
>
> ************

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fabiano Ralo Monteiro 1999-12-20 21:09:52 PostgreSQL rows limit
Previous Message Bruce Momjian 1999-12-20 18:16:54 Re: [GENERAL] recovering a "lost" database