Re: extracting location info from string

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: extracting location info from string
Date: 2011-05-25 22:07:30
Message-ID: 4DDD7DA2.7040309@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 05/25/2011 03:13 PM, Tarlika Elisabeth Schmitz wrote:
> On Wed, 25 May 2011 09:25:48 -0600
> Rob Sargent<robjsargent(at)gmail(dot)com> wrote:
>
>>
>>
>> On 05/24/2011 10:57 AM, Lew wrote:
>>> Tarlika Elisabeth Schmitz wrote:
>>>
>>>> CREATE TABLE person
>>>> (
>>>> id integer NOT NULL,
>>>> "name" character varying(256) NOT NULL,
>>>> "location" character varying(256),
>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>> );
>>>>
>>>> this was just a TEMPORARY table I created for quick analysis
>>>> of my CSV data (now renamed to temp_person).
>
> CREATE TABLE country
> (
> id character varying(3) NOT NULL, -- alpha-3 code
> "name" character varying(50) NOT NULL,
> CONSTRAINT country_pkey PRIMARY KEY (id)
> );
>
>
>> To minimize the ultimately quite necessary human adjudication, one
>> might make good use of what is often termed "crowd sourcing": Keep
>> all the distinct "hand entered" values and a map to the final human
>> assessment.
>
> I was wondering how to do just that. I don't think it would be a good
> idea to hard code this into the clean-up script. Take, for instance,
> variations of COUNTRY.NAME spelling. Where would I store these?
>
> I could do with a concept for this problem, which applies to a lot of
> string-type info.
>
I think you keep your current structures used for deducing the canonical
forms, but with each unique input encounter you add it to you
seen-thus-far list which becomes just one more check (possibly the first
such check).

create table address_input
(
id unique/sequence,
human_input character varying(256),
resolution character varying(256)
)

You may have to add a column for the type of input (if you know for
instance the input is for street address v. country) or you may want the
resolution to be portioned in to county, city and so on.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Andrej 2011-05-25 22:15:50 Re: extracting location info from string
Previous Message Charlie 2011-05-25 22:06:08 Re: [SQL] extracting location info from string