From: | "John D(dot) Burger" <john(at)mitre(dot)org> |
---|---|
To: | PostgreSQL-general general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Primary keys for companies and people |
Date: | 2006-02-07 21:27:31 |
Message-ID: | 425689c057ab9ee2c2865b760b7677a9@mitre.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Leif B. Kristensen wrote:
> Still, I'm struggling with the basic concept of /identity/, eg. is the
> William Smith born to John Smith and Jane Doe in 1733, the same William
> Smith who marries Mary Jones in the same parish in 1758? You may never
> really know. Still, collecting such disparate "facts" under the same ID
> number, thus taking the identity more or less for granted, is the modus
> operandi of computer genealogy. Thus, one of the major objectives of
> genealogy research, the assertion of identity, becomes totally hidden
> the moment that you decide to cluster disparate evidence about what may
> actually have been totally different persons, under a single ID number.
We have a similar issue in a database in which we are integrating
multiple geographic gazetteers, e.g., USGS, NGA, Wordnet. We cannot be
sure that source A's Foobar City is the same as source B's. Our
approach is to =always= import them as separate entities, and use a
table of equivalences that gets filled out using various heuristics.
For example, if each source indicates that its Foobar City is in Baz
County, and we decide to equate the counties, we may equate the cities.
> The alternative is of course to collect each cluster of evidence under
> a
> separate ID, but then the handling of a "person" becomes a programmer's
> nightmare.
Our intent is to have views and/or special versions of the database
that collapse equivalent entities, but I must confess that we have not
done much along these lines - I hope it is not too nightmarish.
- John D. Burger
MITRE
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2006-02-07 21:37:42 | Re: Why pg_hba not in table? |
Previous Message | jao | 2006-02-07 20:52:44 | Re: B-tree performance improvements in 8.x |