Re: Need magic for identifieing double adresses

From: Gary Chambers <gwchamb(at)gmail(dot)com>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need magic for identifieing double adresses
Date: 2010-09-16 03:01:25
Message-ID: AANLkTi=jQ+sgxu=VHJft__PZrwykDL4LvEKieQAn2wak@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Andreas,

> Relevant fields could be  name, street, zip, city, phone
> Is there a way to do something like this with postgresql ?
> I fear this will need still a lot of manual sorting and searching even when
> potential peers get automatically identified.

One of the techniques I use to increase the odds of detecting
duplicates is to trim each column, remove all internal whitespace,
coalesce it into a single string, and calculate an MD5 (some other
hash function may be better) hash. It's not perfect (we are dealing
with humans, after all), but it helps.

-- Gary Chambers

/* Nothing fancy and nothing Microsoft! */

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter Roethlisberger 2010-09-16 08:35:16 libssl issue ?
Previous Message Darren Duncan 2010-09-16 02:59:18 Re: Need magic for identifieing double adresses