From: | Andrej <andrej(dot)groups(at)gmail(dot)com> |
---|---|
To: | Tarlika Elisabeth Schmitz <postgresql3(at)numerixtechnology(dot)de> |
Cc: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: extracting location info from string |
Date: | 2011-05-25 22:15:50 |
Message-ID: | BANLkTimBKGkhdzYCZqBR=pxa9bNhUBXWLg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
<postgresql3(at)numerixtechnology(dot)de> wrote:
> On Wed, 25 May 2011 09:25:48 -0600
> Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>
>>
>>
>>On 05/24/2011 10:57 AM, Lew wrote:
>>> Tarlika Elisabeth Schmitz wrote:
>>>
>>>> CREATE TABLE person
>>>> (
>>>> id integer NOT NULL,
>>>> "name" character varying(256) NOT NULL,
>>>> "location" character varying(256),
>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>> );
>>>>
>>>> this was just a TEMPORARY table I created for quick analysis
>>>> of my CSV data (now renamed to temp_person).
>
> CREATE TABLE country
> (
> id character varying(3) NOT NULL, -- alpha-3 code
> "name" character varying(50) NOT NULL,
> CONSTRAINT country_pkey PRIMARY KEY (id)
> );
>
>
>>To minimize the ultimately quite necessary human adjudication, one
>>might make good use of what is often termed "crowd sourcing": Keep
>>all the distinct "hand entered" values and a map to the final human
>>assessment.
>
> I was wondering how to do just that. I don't think it would be a good
> idea to hard code this into the clean-up script. Take, for instance,
> variations of COUNTRY.NAME spelling. Where would I store these?
>
> I could do with a concept for this problem, which applies to a lot of
> string-type info.
I'd start w/ downloading a list as mentioned here:
http://answers.google.com/answers/threadview?id=596822
And run it through a wee perl script using
http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
to make phonetic matches ...
Then I'd run your own data through DoubleMetaphone, and clean up
matches if not too many false positives show up.
Cheers,
Andrej
From | Date | Subject | |
---|---|---|---|
Next Message | Ozer, Pam | 2011-05-25 22:18:20 | Re: Sorting Issue |
Previous Message | Rob Sargent | 2011-05-25 22:07:30 | Re: extracting location info from string |