From: | Rich Shepard <rshepard(at)appl-ecosys(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Finding Errors in .csv Input Data |
Date: | 2011-02-23 02:58:40 |
Message-ID: | alpine.LNX.2.00.1102221853060.32123@salmo.appl-ecosys.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 22 Feb 2011, Adrian Klaver wrote:
> We are going to need to see at least a sample of the actual data in
> one.csv that is causing the problem. You have an off by two error as you
> suggest, but that could actually have happened earlier in the row. For
> instance the well_finish_date would insert into lot_no because lot_no is
> TEXT and the date value at this point is just text. Same with
> date_cmplt_acc and block_no.
Adrian/Andy,
The data came out of Access as comma-and-quote csv. I massaged it in emacs
and sed to change the delimiter to a pipe rather than a comma and removed
the double quotes.
I cannot assume that each of the 80 problem rows suffer from the defect in
the same place, so if there's a generic process I can apply it row-by-row.
After all, 80 problem rows out of 110,752 is not bad.
Here are the schema and the first row, in one.csv. Because I don't know if
attachments are stripped off before the message is allowed to be distributed
to the list subscribers, I'll just include both here.
DDL:
The column names are original (except for 'ref' that seems to be a
reserved word), but I modifed the data types.
CREATE TABLE water_well (
sequence_no TEXT PRIMARY KEY,
well_log TEXT,
app VARCHAR(20),
notice_of_intent VARCHAR(6),
waiver_no VARCHAR(30),
date_log_rcvd DATE,
date_log_rcvd_acc CHAR(1),
site_type CHAR(1),
work_type CHAR(1),
work_type_rmks TEXT,
proposed_use CHAR(1),
drilling_method CHAR(1),
sc TEXT,
ha TEXT,
twn VARCHAR(3),
legal_twn VARCHAR(3),
rng VARCHAR(3),
legal_rng VARCHAR(3),
sec TEXT,
sec_quarters TEXT,
legal_quarters TEXT,
quarters_seq TEXT,
ref TEXT,
latitude NUMERIC(9,6),
longitude NUMERIC(9,6),
lat_long_src VARCHAR(5),
lat_long_acc CHAR(1),
owner_current TEXT,
owner_address TEXT,
owner_no TEXT,
parcel_no TEXT,
subdivision_name TEXT,
lot_no TEXT,
block_no TEXT,
well_finish_date DATE,
date_cmplt_acc CHAR(1),
gravel_packed CHAR(1),
depth_seal INTEGER,
depth_drilled INTEGER,
depth_bedrock INTEGER,
aquifer_desc TEXT,
depth_cased INTEGER,
csng_diameter FLOAT,
csng_reductions INTEGER,
top_perf INTEGER,
bottom_perf INTEGER,
perf_intervals INTEGER,
static_wl FLOAT,
temperature FLOAT,
yield FLOAT,
drawdown FLOAT,
hours_pumped FLOAT,
test_method CHAR(1),
qual_const_data CHAR(1),
qual_lith_data CHAR(1),
remarks TEXT,
remarks_additional TEXT,
contractor_lic_no VARCHAR(8),
contractor_name TEXT,
contractor_address TEXT,
contractor_drlr_no VARCHAR(6),
driller_lic_no VARCHAR(6),
source_agency TEXT,
user_id TEXT,
date_entry DATE,
update_user_id VARCHAR(16),
date_update DATE,
edit_status VARCHAR(16),
well_start_date DATE,
gravel_pack_top INTEGER,
gravel_pack_bot INTEGER,
utm_x NUMERIC(13,6),
utm_y NUMERIC(13,6)
);
Here's one.csv:
68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIR COMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1 BOX 10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23
I hope you're able to see what I keep missing as the source of the
problem.
Rich
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2011-02-23 03:33:13 | Re: Mysql to Postgresql |
Previous Message | Adrian Klaver | 2011-02-23 01:45:53 | Re: Finding Errors in .csv Input Data |