Re: Determining logically unique entities across many partially complete rows where at least one column matches

From: Rob Sargent <robjsargent(at)gmail(dot)com>
To: Jamie Tufnell <diesql(at)googlemail(dot)com>
Cc: pgsql-sql <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Determining logically unique entities across many partially complete rows where at least one column matches
Date: 2009-08-11 15:40:51
Message-ID: 4A819103.8040102@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Seems to me that if you can safely identify which snippets correspond to
a given entity you want a single id for the entity. An entity-snippet
relationship seems a must. I would not lean too heavily on a single
table solution unless you're considering arrays for openid,email and
phone. (And given the one-to-many-real-people on phone I would be leery
of this "identifier".)

Jamie Tufnell wrote:
> Hi,
>
> I am faced with a modeling problem and thought I'd see if anyone has run
> into something similar and can offer some advice.
>
> Basically my problem domain is cataloguing "snippets of information" about
> "entities" which are loosely identified.
>
> Entities can be identified up to 3 different methods (email, phone or
> openid.)
>
> Entities can have zero or many emails, phone numbers and openids. The
> only restriction is they must have at least one value in one of those three
> columns.
>
>
> Some sample data:
>
> snippet #1
> email: null
> phone: +1234567890
> openid: john(at)myopenid(dot)net
> information: This is snippet #1
>
> snippet #2
> email: foo(at)bar(dot)com
> phone: null
> openid: johnny.name
> information: This is snippet #2
>
> At this point snippet #1 and #2 could refer to different entities.
>
> snippet #3
> email: bar(at)baz(dot)com
> phone: +1234567890
> openid: johnny.name
> information: This is snippet #3
>
> But now all three snippets definitely refer to the same entity, as far as
> we're concerned:
>
> Entity: 1
> OpenIDs: johnny.name, john(at)myopenid(dot)net
> Phones: +1234567890
> Emails: foo(at)bar(dot)com, bar(at)baz(dot)com
>
> So as far as modeling this goes, I'm stuck between:
>
> 1. Normalizing as usual with some serious triggers to maintain the
> relationships.
> 2. Just having a snippets table with these fields inline and make these
> inferences at query time.
> 3. Something in between.
> 4. Using a document store like CouchDB.
>
> The kinds of queries I need to do right now (which will no doubt change):
>
> * Return all snippets.
> * Return all distinct entities.
> * Find all id for a distinct entity given a single piece of id.
> * Find all snippets for a distinct entity.
>
> To do it in one table, I am thinking something like this:
>
> create table snippets (
> id serial not null primary key,
> email text,
> phone_number text,
> openid text,
> information text not null,
> check (email is not null or
> phone_number is not null or openid is not null)
> );
>
> with queries like:
>
> * Find all snippets for one distinct entity, searching by openid:
>
> select * from snippets
> where phone_number =
> (select phone_number from snippets where openid = 'john(at)myopenid(dot)net')
> or email =
> (select email from snippets where openid = 'john(at)myopenid(dot)net')
> or openid in
> (select openid from snippets
> where phone_number =
> (select phone_number from snippets where openid = 'john(at)myopenid(dot)net')
> or email =
> (select email from snippets where openid = 'john(at)myopenid(dot)net'));
>
>
> Or if I was to model as usual I am thinking something like this:
>
> create table entities (
> id serial not null primary key
> );
>
> create table entity_has_email (
> entity_id integer not null
> references entities (id) on delete cascade on update cascade,
> email text not null unique
> );
>
> create table entity_has_phone_number (
> entity_id integer not null
> references entities (id) on delete cascade on update cascade,
> phone_number text not null unique
> );
>
> create table entity_has_openid (
> entity_id integer not null
> references entities (id) on delete cascade on update cascade,
> openid text not null unique
> );
>
> create table snippets (
> id serial not null primary key,
> entity_id integer not null
> references entities (id) on delete cascade on update cascade,
> information text not null
> );
>
> (followed by a mass of on insert/update/delete triggers)
>
> with queries like:
>
> * Find all snippets for a distinct entity, by one identifying field:
>
> select s.* from snippets s
> join entity_has_email e on s.entity_id = e.id
> join entity_has_phone_number p on s.entity_id = p.id
> join entity_has_openid o on s.entity_id = o.id
> where o.openid = 'john(at)myopenid(dot)net';
>
> Another option, sort of half way between the two could be:
>
> create table snippets (
> id serial not null primary key,
> entity_id integer not null
> references entities (id) on delete cascade on update cascade,
> information text not null
> );
>
> create table entities (
> id serial not null primary key,
> email text,
> phone_number text,
> openid text,
> check (email is not null or
> phone_number is not null or openid is not null)
> );
>
> * Find all snippets for a distinct entity, by openid = 'john(at)myopenid(dot)net'
>
> select * from snippets
> where entity_id in (
> select id from entities
> where phone_number =
> (select phone_number from entities where openid = 'john(at)myopenid(dot)net')
> or email =
> (select email from entities where openid = 'john(at)myopenid(dot)net')
> or openid in
> (select openid from entities
> where phone_number =
> (select phone_number from entities where openid = 'john(at)myopenid(dot)net'
> or email =
> (select email from entities where openid = 'john(at)myopenid(dot)net') ));
>
> At this point I am leaning towards that last method.
>
> Has anyone had to model something similar? Did you use one of these
> methods or something else?
>
> Any/all comments appreciated!
>
> J
>
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Peter Headland 2009-08-11 16:55:36 Re: Updating a specific number of rows in pl/pgsql
Previous Message Bor 2009-08-11 13:13:30 Month/year between two dates