Re: extracting location info from string

From: Tarlika Elisabeth Schmitz <postgresql3(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: extracting location info from string
Date: 2011-05-26 20:40:12
Message-ID: 20110526214012.24213f0c@dick.coachhouse
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thu, 26 May 2011 10:15:50 +1200
Andrej <andrej(dot)groups(at)gmail(dot)com> wrote:

>On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
><postgresql3(at)numerixtechnology(dot)de> wrote:
>> On Wed, 25 May 2011 09:25:48 -0600
>> Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>>On 05/24/2011 10:57 AM, Lew wrote:
>>>> Tarlika Elisabeth Schmitz wrote:
>>>>
>>>>> CREATE TABLE person
>>>>> (
>>>>> id integer NOT NULL,
>>>>> "name" character varying(256) NOT NULL,
>>>>> "location" character varying(256),
>>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>>> );
>>>>>
>>>>> this was just a TEMPORARY table I created for quick analysis
>>>>> of my CSV data (now renamed to temp_person).
>>
>> CREATE TABLE country
>> (
>>  id character varying(3) NOT NULL, -- alpha-3 code
>>  "name" character varying(50) NOT NULL,
>>  CONSTRAINT country_pkey PRIMARY KEY (id)
>> );
>>
>>
>>>To minimize the ultimately quite necessary human adjudication, one
>>>might make good use of what is often termed "crowd sourcing":  Keep
>>>all the distinct "hand entered" values and a map to the final human
>>>assessment.
>>[...]
>> I could do with a concept for this problem, which applies to a lot of
>> string-type info.
>
>I'd start w/ downloading a list as mentioned here:
>http://answers.google.com/answers/threadview?id=596822
>
>And run it through a wee perl script using
>http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
>to make phonetic matches ...
>
>Then I'd run your own data through DoubleMetaphone, and clean up
>matches if not too many false positives show up.

Many thanks for all your suggestions. It will take me a while to work
my way through these as I have several open ends.

In a similar vein, the PERSONs names are
1) <firstname> <surname>
2) <initials> <surname> (more common)
3) <title> <initials>|<firstname> <surname>

Where I have firstname and or title I'd be quite keen to determine sex
as it would be interesting from a statistics point of view to
distinguish.

I am basically just interested in people from two countries, names
mainly English.

--

Best Regards,
Tarlika Elisabeth Schmitz

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Gauthier, Dave 2011-05-26 21:04:04 Re: copy record?
Previous Message Gauthier, Dave 2011-05-26 20:23:50 copy record?