From: | Jamie Tufnell <diesql(at)googlemail(dot)com> |
---|---|
To: | pgsql-sql <pgsql-sql(at)postgresql(dot)org> |
Subject: | Determining logically unique entities across many partially complete rows where at least one column matches |
Date: | 2009-08-11 06:19:59 |
Message-ID: | b0a4f3350908102319w2e536f1eyd29609f0effff77c@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi,
I am faced with a modeling problem and thought I'd see if anyone has run
into something similar and can offer some advice.
Basically my problem domain is cataloguing "snippets of information" about
"entities" which are loosely identified.
Entities can be identified up to 3 different methods (email, phone or
openid.)
Entities can have zero or many emails, phone numbers and openids. The
only restriction is they must have at least one value in one of those three
columns.
Some sample data:
snippet #1
email: null
phone: +1234567890
openid: john(at)myopenid(dot)net
information: This is snippet #1
snippet #2
email: foo(at)bar(dot)com
phone: null
openid: johnny.name
information: This is snippet #2
At this point snippet #1 and #2 could refer to different entities.
snippet #3
email: bar(at)baz(dot)com
phone: +1234567890
openid: johnny.name
information: This is snippet #3
But now all three snippets definitely refer to the same entity, as far as
we're concerned:
Entity: 1
OpenIDs: johnny.name, john(at)myopenid(dot)net
Phones: +1234567890
Emails: foo(at)bar(dot)com, bar(at)baz(dot)com
So as far as modeling this goes, I'm stuck between:
1. Normalizing as usual with some serious triggers to maintain the
relationships.
2. Just having a snippets table with these fields inline and make these
inferences at query time.
3. Something in between.
4. Using a document store like CouchDB.
The kinds of queries I need to do right now (which will no doubt change):
* Return all snippets.
* Return all distinct entities.
* Find all id for a distinct entity given a single piece of id.
* Find all snippets for a distinct entity.
To do it in one table, I am thinking something like this:
create table snippets (
id serial not null primary key,
email text,
phone_number text,
openid text,
information text not null,
check (email is not null or
phone_number is not null or openid is not null)
);
with queries like:
* Find all snippets for one distinct entity, searching by openid:
select * from snippets
where phone_number =
(select phone_number from snippets where openid = 'john(at)myopenid(dot)net')
or email =
(select email from snippets where openid = 'john(at)myopenid(dot)net')
or openid in
(select openid from snippets
where phone_number =
(select phone_number from snippets where openid = 'john(at)myopenid(dot)net')
or email =
(select email from snippets where openid = 'john(at)myopenid(dot)net'));
Or if I was to model as usual I am thinking something like this:
create table entities (
id serial not null primary key
);
create table entity_has_email (
entity_id integer not null
references entities (id) on delete cascade on update cascade,
email text not null unique
);
create table entity_has_phone_number (
entity_id integer not null
references entities (id) on delete cascade on update cascade,
phone_number text not null unique
);
create table entity_has_openid (
entity_id integer not null
references entities (id) on delete cascade on update cascade,
openid text not null unique
);
create table snippets (
id serial not null primary key,
entity_id integer not null
references entities (id) on delete cascade on update cascade,
information text not null
);
(followed by a mass of on insert/update/delete triggers)
with queries like:
* Find all snippets for a distinct entity, by one identifying field:
select s.* from snippets s
join entity_has_email e on s.entity_id = e.id
join entity_has_phone_number p on s.entity_id = p.id
join entity_has_openid o on s.entity_id = o.id
where o.openid = 'john(at)myopenid(dot)net';
Another option, sort of half way between the two could be:
create table snippets (
id serial not null primary key,
entity_id integer not null
references entities (id) on delete cascade on update cascade,
information text not null
);
create table entities (
id serial not null primary key,
email text,
phone_number text,
openid text,
check (email is not null or
phone_number is not null or openid is not null)
);
* Find all snippets for a distinct entity, by openid = 'john(at)myopenid(dot)net'
select * from snippets
where entity_id in (
select id from entities
where phone_number =
(select phone_number from entities where openid = 'john(at)myopenid(dot)net')
or email =
(select email from entities where openid = 'john(at)myopenid(dot)net')
or openid in
(select openid from entities
where phone_number =
(select phone_number from entities where openid = 'john(at)myopenid(dot)net'
or email =
(select email from entities where openid = 'john(at)myopenid(dot)net') ));
At this point I am leaning towards that last method.
Has anyone had to model something similar? Did you use one of these
methods or something else?
Any/all comments appreciated!
J
From | Date | Subject | |
---|---|---|---|
Next Message | D'Arcy J.M. Cain | 2009-08-11 10:25:03 | Re: Updating a specific number of rows in pl/pgsql |
Previous Message | A. Kretschmer | 2009-08-11 05:12:21 | Re: Import (.CVS File) to postgreSql |