Quick Links

Re: Are there performance advantages in storing bulky field in separate table?

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org, Ian Mayo <ianmayo(at)tesco(dot)net>
Subject:	Re: Are there performance advantages in storing bulky field in separate table?
Date:	2009-04-09 12:23:35
Message-ID:	200904090823.35876.xzilla@users.sourceforge.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wednesday 08 April 2009 18:25:25 Ron Mayer wrote:
> Robert Treat wrote:
> > You can be sure that discussion of this topic in this forum will soon be
> > visited by religious zealots, but the short answer is "nulls are bad,
> > mmkay". A slightly longer answer would be that, as a general rule,
> > attributes of your relations that only apply to 1% of the rows are better
> > represented as a one
>
> To fulfill your prophecy of zealotry, I've got a number of tables
> with columns that are mostly null that I can't think of that nice a
> way of refactoring. I'd love ideas to improve the design, though.
>
> One example's an address table. Most addresses have a few fields
> that are typically present (building number, city, state, etc).
> Others, as described in various government's address standards,
> are fields that are typically absent. For example in US addressing
> rules, the "Urbanization Name" line:
> http://www.usps.com/ncsc/addressstds/addressformats.htm
> MRS MARIA SUAREZ Name
> URB LAS GLADIOLAS Urbanization name
> 150 CALLE A House no. and st. name
> SAN JUAN PR 00926-3232 City, state, and ZIP+4
> Similarly sparse columns in my address tables are,
> titles, division/department Names and mailstop codes.
> (described here: http://pe.usps.gov/text/pub28/pub28c3_011.htm)
>
> While I realize I could stick in some string (empty string, or
> some other magic string like "urbanization name doesn't apply to
> this address") into a table, it sure is convenient to put nulls
> in those columns.
>
> I'm quite curious what you'd suggest a well-designed address table
> would look like without nulls.

The decision here would depend on your perticular sect of the anti-null
religion, but you have a couple of choices:

1) Break these fields out into one or more tables, containing entries only for
those address that have the additional information. Ideally you might be able
to do something like "extended_address_info" where all of these fields could
be kept, all of them being non-null. I suspect you can't do the ideal, so
you'd end up with a bunch of tables.

This would be used by the "normalization trumps nullification" sect

2) Given that all of these columns have an authoritarian source of what should
be allowed, you could use the "magic string" approach without requiring too
much magic, and these columns could even be a foriegn key into a table
containing the authoritarian options.

This could be justified by the all nulls are bad sect, but might also be used
by a null using crowd who take a strict approach to nulls meaning "unknown
value", since here it isn't that the value is unknown; there isn't a valid
value for these columns. (Adding the magic string to your FK table creates a
valid reference value for those entries that would otherwise not match)

Personally, if you force me into a "well-designed address table *without*
nulls" decision, I would take this latter approach. HTH

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

In response to

Re: Are there performance advantages in storing bulky field in separate table? at 2009-04-08 22:25:25 from Ron Mayer

Browse pgsql-general by date

	From	Date	Subject
Next Message	Scott Marlowe	2009-04-09 12:24:28	Re: Will PostgreSQL 8.4 allow having different encoding databases on a same PostgreSQL server ?
Previous Message	Guillaume Lelarge	2009-04-09 12:12:33	Re: Will PostgreSQL 8.4 allow having different encoding databases on a same PostgreSQL server ?