From: | Lew <noone(at)lewscanon(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: extracting location info from string |
Date: | 2011-05-24 16:57:57 |
Message-ID: | irgo1v$dbv$1@news.albasani.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Tarlika Elisabeth Schmitz wrote:
> Lew wrote:
>> That isn't a table structure, that's a freeform text structure. You
>> didn't state your question, Tarlika, but your database structure is
>> terrible. For example, "region" and "country" should be different
>> columns.
> I presume you are referring to my original post:
> CREATE TABLE person
> (
> id integer NOT NULL,
> "name" character varying(256) NOT NULL,
> "location" character varying(256),
> CONSTRAINT person_pkey PRIMARY KEY (id)
> );
>
> Sorry, this was just a TEMPORARY table I created for quick analysis of
> my CSV data (now renamed to temp_person).
>
>
>
> The target table is:
> CREATE TABLE person
> (
> id integer NOT NULL,
> "name" character varying(100) NOT NULL,
> country character varying(3),
> county character varying(3),
> town character varying(50),
> CONSTRAINT trainer_pkey PRIMARY KEY (id),
> CONSTRAINT country_person_fk FOREIGN KEY (country)
> REFERENCES country (id) MATCH SIMPLE
> ON UPDATE CASCADE ON DELETE RESTRICT,
> CONSTRAINT county_person_fk FOREIGN KEY (country, county)
> REFERENCES county (country, code) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> );
Ah, yes, that makes much more sense. Temporary tables such as you describe
can be very convenient and effective. Thanks for the clarification.
I think this problem is very widespread, namely how to get structured
information out of freeform data. I've encountered it many times over the
years, as have so many I know. I believe that human intervention will always
be needed for this type of work, e.g., distinguishing place names that seem
the same or correlating ones that seem distinct. I also don't know of any
perfect approach. Perhaps the best one can find is a probabilistic promise
that error will be less than some epsilon.
That said, if you have a robust process to correct errors as the user
population discovers them, then you can approach perfection asymptotically.
Sometimes the best solution to a technical problem is a good human process.
From an engineering standpoint, user feedback is a vital element of
homeostatic control.
Edward W. Rouse's suggestion of a reference table to resolve different forms
of address or region identification would fit well with such a process.
--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-05-24 19:14:23 | Re: Performance of NOT IN and <> with PG 9.0.4 |
Previous Message | Jasmin Dizdarevic | 2011-05-24 11:45:36 | Re: Performance of NOT IN and <> with PG 9.0.4 |