Quick Links

"Fuzzy" Matches on Nicknames

From:	Michael Sheaver <msheaver(at)me(dot)com>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	"Fuzzy" Matches on Nicknames
Date:	2016-11-30 00:10:59
Message-ID:	18DF7A91-78F6-4F63-8A7E-BEBE3AEE7AC6@me.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Greetings,

I have two tables that are populated using large datasets from disparate external systems, and I am trying to match records by customer name between these two tables. I do not have any authoritative key, such as customerID or nationalID, by which I can match them up, and I have found many cases where the same customer has different first names in the two datasets. A sampling of the differences is as follows:

Michael <=> Mike
Tom <=> Thomas
Liz <=> Elizabeth
Margaret <=> Maggie

How can I build a query in PostgreSQL (v. 9.6) that will find possible matches like these on nicknames? My initial guess is that I would have to either find or build some sort of intermediary table that contains associated names like those above. Sometimes though, there will be more than matching pairs, like:

Jim <=> James <=> Jimmy <=> Jimmie
Bill <=> Will <=> Willie <=> William

and so forth.

Has anyone used or developed PostgreSQL queries that will find matches like these? I am running all my database queries. on my local laptops (Win7 and macOS), so performance or uptime is no issue here. I am curious to see how others in this community have creatively solved this common problem.

One of the PostgreSQL dictionaries (synonym, thesaurus etc.) might work here, but honestly I am clueless as to how to set this up or use it in queries successfully.

Thanks,
Michael (aka Mike, aka Mikey)

Responses

Re: "Fuzzy" Matches on Nicknames at 2016-11-30 00:56:34 from rob stone

Browse pgsql-general by date

	From	Date	Subject
Next Message	rob stone	2016-11-30 00:56:34	Re: "Fuzzy" Matches on Nicknames
Previous Message	Adrian Klaver	2016-11-29 23:41:48	Re: Index size