Re: Primary keys for companies and people

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: David Goodenough <david(dot)goodenough(at)btconnect(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Primary keys for companies and people
Date: 2006-02-02 20:09:06
Message-ID: 20060202200906.GC25752@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Feb 02, 2006 at 10:36:54AM +0000, David Goodenough wrote:
> > Still, I'm struggling with the basic concept of /identity/, eg. is the
> > William Smith born to John Smith and Jane Doe in 1733, the same William
> > Smith who marries Mary Jones in the same parish in 1758? You may never
> > really know. Still, collecting such disparate "facts" under the same ID
> > number, thus taking the identity more or less for granted, is the modus
> > operandi of computer genealogy. Thus, one of the major objectives of
> > genealogy research, the assertion of identity, becomes totally hidden
> > the moment that you decide to cluster disparate evidence about what may
> > actually have been totally different persons, under a single ID number.
> >
> > The alternative is of course to collect each cluster of evidence under a
> > separate ID, but then the handling of a "person" becomes a programmer's
> > nightmare.

> There is also the problem that a name can change. People change names
> by deed-poll, and also women can adopt a married name or keep their old
> one. All in all an ID is about the only answer.

True, the issue being ofcourse that changing a name doesn't change
their identity.

To the GP, your page is an interesting one and raises several
interesting points. In particular the one about the "person" being the
conclusion of the rest of the database. You essentially have a set of
facts "A married B in C on date D" and you're trying to correlate
these. In the end it's just a certain amount of guess work, especially
since back then they wern't that particular about spelling as they are
today.

My naive view is that you're basically assigning trust values to each
fact and the chance that two citations refer to the same person. In
principle you'd be able to cross-reference all these citations and
build the structure quasi-automatically. I suppose in practice this is
done by hand.

As for your question, I think you're stuck with having a person ID.
Basically because you need to identify a person somehow. Given you
still have the original citiations, you can split a person into
multiple if the situation appears to not work out.

One thing I find odd though, your "person" objects have no birthdate or
deathdate. Or birth place either. I would have thought these elements
would be fundamental in determining if two people are the same, given
that they can't change and people are unlikely to forget them.

Put another way, two people with the same birthday in the same place
with similar names are very likely to be the same. If you can
demostrate this is not the case that's another fact. In the end you're
dealing with probabilities, you can never know for sure.

Anyway, hope this helps. It's a subject I've been vaguely interested in
but never really had the time to look into.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tony Caduto 2006-02-02 21:13:04 question about MAKE_EXPIRED_TUPLES_VISIBLE
Previous Message Peter Eisentraut 2006-02-02 19:52:09 Re: How to find release notes