Quick Links

Re: Removing duplicates

From:	Matthew Hagerty <matthew(at)brwholesale(dot)com>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject:	Re: Removing duplicates
Date:	2002-02-26 17:16:39
Message-ID:	5.1.0.14.2.20020226114626.00b2f650@imap.brwholesale.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

At 08:39 AM 2/26/2002 -0800, Josh Berkus wrote:
>Matt,
>
> > I have a customer database (name, address1, address2, city, state,
> > zip) and I need a query (or two) that will give me a mailing list
> > with the least amount of duplicates possible. I know that precise
> > matching is not possible, i.e. "P.O. Box 123" will never match "PO
> > Box 123" without some data massaging, but if I can isolate even 50%
> > of any duplicates, that would help greatly.
>
> From the sound of things, you are trying to get out a mailing with the
> least number of duplicates you can in a limited time, rather than
> trying to clean up the list for permanent storage. Chances are, you
> bought or traded this list from an outside source, yes?

Actually the database is a collection of customers collected over the past
8 years. The sales people are "supposed" to try to find customers when
they call in to place an order, but that does not always happen and
customers have undoubtedly been added more than once. Of course, over 8
years and various computer systems, punctuation, case, spelling, different
employees, etc. the data is less than perfect. It is amazing how easily a
user can enter a duplicate no matter how tricky or smart you think your
code is! :-)

I will certainly look into the address standardization information that has
been posted. Thank you everyone for your input!

Matthew

<snip>

In response to

Re: Removing duplicates at 2002-02-26 16:39:26 from Josh Berkus

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Andy Marden	2002-02-26 17:20:16	Re: Join Statements
Previous Message	Dan MacNeil	2002-02-26 16:44:45	Re: Removing duplicates