From: | Archibald Zimonyi <archie(at)netg(dot)se> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: NULL values or not? |
Date: | 2001-12-21 17:22:54 |
Message-ID: | Pine.LNX.4.21.0112211807470.21359-100000@valdez.netg.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Thank you, thank you, thank you, thank you....
Your post, Josh, was very informative for me. As it is now, my Vampire
schema, which I forgot to add as an attachment, doesn't allow a single
NULL value. I do have a small amount of data, which is why things might
still work as fast as they do.
The start of my problems is still the fact that when I do VACUUM ANALYZE
my main query takes almost a minute (from around a second), which has made
me perlexed by the whole thing, since the EXPLAIN shows relatively low
values (although reading that is still an art, I have a query which
results in more rows and has higher EXPLAIN values but still takes less
time then my main query).
I have read a book which I find immensly useful and good, called:
The Practical SQL Handbook
published by Addison-Wesley. It is this book that has given me my main
view of SQL and database design. I think my database uses a lot of the
normalization rules which is why I try to avoid NULL values. An example
from my own Vampire database follows:
CREATE TABLE cards
(
card_id int2 DEFAULT NEXTVAL('cards_seq'),
cardname text NOT NULL,
PRIMARY KEY(card_id)
);
CREATE TABLE cards_names
(
card_id int2 NOT NULL,
cardname text NOT NULL,
FOREIGN KEY (card_id) REFERENCES cards
);
The table cards include the primary names of all cards. Each card is
unique, giving it a unique id. Some cards have two names, due to the fact
that there were misprints etc.
So the table cards_names includes those few cards which might have two
names.
CREATE TABLE cards_costs
(
card_id int2 NOT NULL,
cost int2 NOT NULL,
pool bool NOT NULL,
FOREIGN KEY (card_id) REFERENCES cards
);
Some cards, but not all, have a certain cost to play. Instead of having a
"cost" column in "cards" and then having NULL values where the cards have
no cost, I once again decided to create a new table connecting the cost to
a card with it's id. So cards_costs also includes less cards then do the
table cards.
It is this I based my questions on, I know why I do it the way I do, but I
would also like to hear what other people think of this solution, and
perhaps why I should use NULLS instead. As I said in an earlier post,
there is a cost of joining the tables, which is higher then having one
table with NULLS.
Once again, thanks for the replies folks,
Archie
On Fri, 21 Dec 2001, Josh Berkus wrote:
> Archibald,
>
> > So again, I have no problems with NULLS, I just would like someone to
> > share when they use them contra when they don't.
>
> Thank you for bringing a theory question onto the list! We spend much
> of our time on pgsql-sql focused on practical minutia. Thus several of
> the respondants who siezed on your hypothetical example rather than
> answering the question. I'd love to see some of the more experienced DB
> theorists weigh in on this issue (Tom? Joe C.?).
>
> Relationally speaking, NULLs are a necessary evil. You are right to
> approach them with caution. I cannot count the number of times one of
> my database rescue attempts has involved eliminating, sub-tabling, or
> consolidating NULLable columns.
>
> By coincidence, yesterday I posted this to PGSQL-NOVICE:
>
> ============================================================
> > 3) Use of NULL
> >
> > It would facilitate entering data into TABLE clone if seq, qual...
> > were defined as NULL even though values for these columns will/does
> > exist. If I defined these columns as NULL I would not have to have
> > ALL of the data together at one time, but could enter it in steps.
> > Is this an ill conceived notion?
>
> Well, this purpose is what NULL is for. Cases where data is unknown or
> not yet available. That being said, allowing NULLs takes data integrity
> out of the table design and moves it elsewhere in the software. If you
> allow NULLs for seq, then you will need to create a data integrity
> report that searches for old entries with NULL in the seq column.
> Otherwise, you risk having some required data never filled in.
>
> Also, remember that you can't JOIN on a NULL value. For example, if you
> allow NULLs in, say clones.gb_id, then if you do a report on clones
> JOIN
> gb_accessions, the clones who are missing GB will not show up with a
> blank GB, instead they will not show up at all! You can get around this
> with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
> joins.
>
> Instead, I recommend that everywhere it is possible, you have an actual
> value that indicates why the data has not been filled in. FOr example,
> you could create a gb_accession with the ID of zero (0) (and accn_no,
> gi_no of 0 as well) which would indicate (to you) "gb not run yet".
> This gives you more information than NULL (which could indicate a
> number
> of things: GB not run, GB lost, data error, program error, etc.), as
> well as supporting JOINs cleanly. You could even have more than one
> such value to indicate different reasons for missing info.
> =====================================================
>
> For a more in-depth discussion of NULLs and their problems and
> workarounds, see Fabian Pascal's "Practical Issues in Database
> Management", which has the better part of a chapter on the topic.
>
> -Josh
>
> ______AGLIO DATABASE SOLUTIONS___________________________
> Josh Berkus
> Complete information technology josh(at)agliodbs(dot)com
> and data management solutions (415) 565-7293
> for law firms, small businesses fax 621-2533
> and non-profit organizations. San Francisco
>
From | Date | Subject | |
---|---|---|---|
Next Message | Josh Berkus | 2001-12-21 21:09:57 | Re: NULL values or not? |
Previous Message | Josh Berkus | 2001-12-21 16:55:11 | Re: NULL values or not? |