Removing nulls with 6NF

From: Robert DiFalco <robert(dot)difalco(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Removing nulls with 6NF
Date: 2014-01-23 16:34:33
Message-ID: CAAXGW-w=gbJp3eWbT1m7DGakW3Uent=fsj10nUghyZ1vCjiBHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a question about replacing NULL columns with tables. Say I have a
table called Contacts that represents the Contacts (say from Facebook) of
my Users. Right now Contacts has several columns and in addition to these
it has an Owner (the user that owns this contact) and User. The User is
NULL if there is no user in my system that matches the contact. The Owner
can never be NULL.

If I want to get all the contacts for a user and know which are and which
are not themselves users I would do an OUTER join with the Users table on
user_id (which may or may not be null). Or I may not want the user itself
and not need to perform the OUTER join at all. I just want to see if they
are NULL or a valid ID from the Users table.

So, if I endeavor to get rid of this kind of usage of NULL I could create
another table called ContactUsers with two fields (ContactID, UserID).

I'd go from this:

Contacts (ContactID, OwnerID, UserID, ...)

To this:

Contacts(ContactID, OwnerID, ... )
ContactUsers( ContactID, UserID)

So my questions are these:

I assume my new queries of all contacts with or without correlated
users would now change to all need a User column that has a SELECT query on
ContactUsers for that column.

How would this perform relative to the approach that uses a field and NULL.

I *think* updates will actually be faster because instead of updating I
will simply insert into ContactUsers and not have to lookup the Contact row
and update its UserID field. But I'm concerned about queries.

I'm also concerned about when a User changes their details. Now I will
have to DELETE from the ContactUsers table then INSERT if there are
different correlations. Before I would just update the contact rows with
the modified user to NULL and re-correlate.

Should I be concerned about the performance of this approach? In addition
to getting rid of NULLs and approach like this would help me reduce the
number of columns in some tables. But that SELECT for each column thing has
me concerned since the queries are very fast now using a NULL field.

Thanks!

Browse pgsql-performance by date

  From Date Subject
Next Message fburgess 2014-01-24 16:23:32 PostgreSQL 9.3.2 Performance issues
Previous Message Josh Berkus 2014-01-22 20:01:39 Re: Time of query result delivery