From: | Eric Walstad <eric(at)ericwalstad(dot)com> |
---|---|
To: | sfpug(at)postgresql(dot)org |
Subject: | Re: SQL assistance, please... |
Date: | 2006-02-15 06:34:25 |
Message-ID: | 200602142234.26196.eric@ericwalstad.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | sfpug |
Hi Jeff,
On Tuesday 14 February 2006 20:45, Jeff Frost wrote:
> Eric,
>
> You probably want to run a query with a self join like this:
>
> SELECT a1.addr1, a1.address_hash FROM address a1, address a2 WHERE
> a1.address_hash = a2.address_hash AND
> a1.addr1 != a2.addr1;
Yes, I tried this, but it gives me duplicate (well, almost duplicate) results
like:
id1 | id2 | address_hash | a1_addr1 | a2_addr1
---------+---------+--------------+------------------+----------------
1986889 | 1150278 | -1000076938 | 123 My Street | 345 Your Street
1150278 | 1986889 | -1000076938 | 345 Your Street | 123 My Street
Also, it only looks for cases where the addr1 fields differ. This is better
than nothing; I can count the records and divide by two to get a total count
of collisions.
Thanks for mentioning this. Maybe I was closer than I thought. I suppose I
could OR together the comparisons of the other fields (a1.addr1 != a2.addr1
OR a1.addr2 != a2.addr2, etc) to look for instances of other mismatched
fields.
> You should probably also consider placing a unique constraint on
> address_hash so you'll get an error when attempting to insert a duplicate.
Unfortunately, I can't do this. This table data is mostly static and defined
by others. It's my job to be able to match my user's inputs to the data in
this table. My best option, I think, it to change hashing algorithms if the
number of collisions is 'too large'.
Thanks again,
Eric.
From | Date | Subject | |
---|---|---|---|
Next Message | chris mungall | 2006-02-20 05:49:48 | Positions available at LBL: bio-ontologies |
Previous Message | Jeff Frost | 2006-02-15 04:45:34 | Re: SQL assistance, please... |