Re: Design question regarding arrays

From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: A Gilmore <agilmore(at)shaw(dot)ca>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Design question regarding arrays
Date: 2004-08-08 14:06:22
Message-ID: 20FD3532-E944-11D8-B87D-000A95C88220@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice


On Aug 8, 2004, at 2:03 PM, A Gilmore wrote:

> Regularly Ill have to search this db for records containing a specific
> catagory. For example get all books belonging to the sci-fi catagory,
> for the most parts the searchs will only look for a single catagory.
> They would look like this :
>
> SELECT * FROM my_table WHERE 'sci-fi' = ANY (catagory);
>
> Is that bad design?

Probably. I'd set up a categories table that lists the possible
categories. Then create a books_categories table that lists the
categories for each book. For example:

create table books (
book_id serial not null unique
, title text not null
);

create table categories (
category_name text not null unique
);

create table books_categories (
book_id integer not null
references books (book_id)
on update cascade on delete cascade
, category_name text not null
references categories (category_name)
on update cascade on delete cascade
, unique (book_id, category_name)
);

insert into books (title) values ('Visual Explanations');
insert into books (title) values ('The Visual Display of Quantitative
Information');
insert into books (title) values ('The Mythical Man-Month');

select book_id, title from books;
book_id | title
---------+------------------------------------------------
1 | Visual Explanations
2 | The Visual Display of Quantitative Information
3 | The Mythical Man-Month
(3 rows)

insert into categories (category_name) values ('graphic design');
insert into categories (category_name) values ('non-fiction');
insert into categories (category_name) values ('fiction');
insert into categories (category_name) values ('project management');

insert into books_categories (book_id, category_name) values
(1,'graphic design');
insert into books_categories (book_id, category_name) values
(2,'graphic design');
insert into books_categories (book_id, category_name) values
(3,'project management');
insert into books_categories (book_id, category_name) values
(1,'non-fiction');
insert into books_categories (book_id, category_name) values
(2,'non-fiction');
insert into books_categories (book_id, category_name) values
(3,'non-fiction');

-- display books and their categories using explicit joins

select
title
, category_name
from books
join books_categories using (book_id)
order by title, category_name;
title | category_name
------------------------------------------------+--------------------
The Mythical Man-Month | non-fiction
The Mythical Man-Month | project management
The Visual Display of Quantitative Information | graphic design
The Visual Display of Quantitative Information | non-fiction
Visual Explanations | graphic design
Visual Explanations | non-fiction
(6 rows)

-- or with implicit joins using the WHERE clause

select
title
, category_name
from books as b
, books_categories as b_c
where b.book_id = b_c.book_id
order by title, category_name;
title | category_name
------------------------------------------------+--------------------
The Mythical Man-Month | non-fiction
The Mythical Man-Month | project management
The Visual Display of Quantitative Information | graphic design
The Visual Display of Quantitative Information | non-fiction
Visual Explanations | graphic design
Visual Explanations | non-fiction
(6 rows)

-- selecting only graphic design books

select title
from books
join books_categories using (book_id)
where category_name = 'graphic design';
title
------------------------------------------------
Visual Explanations
The Visual Display of Quantitative Information
(2 rows)

For convenience, I'd probably also create a view that joins book
information and categories, such as:

create view books_categories_view as
select book_id, title, category_name
from books
join books_categories using (book_id);

Then you don't have to write the join each time. You use the like a
table:

select
book_id
, title
, category_name
from books_categories_view
order by title, category_name;
book_id | title |
category_name
---------+------------------------------------------------
+--------------------
3 | The Mythical Man-Month | non-fiction
3 | The Mythical Man-Month | project
management
2 | The Visual Display of Quantitative Information | graphic
design
2 | The Visual Display of Quantitative Information | non-fiction
1 | Visual Explanations | graphic
design
1 | Visual Explanations | non-fiction
(6 rows)

select title from books_categories_view where category_name = 'graphic
design';
title
------------------------------------------------
Visual Explanations
The Visual Display of Quantitative Information
(2 rows)

You might be wondering what the advantage of the categories table is,
as I haven't used it at all in any of the queries. One reason is that
it keeps a set list of categories. If you have the categories all in
one place, you are probably less like to set up similar categories,
such as 'Graphic Design' and 'graphic art design'. Also, if you have
more information about the categories, for example a longer description
of what the category covers, you keep that in the category table,
rather than duplicating it in the books_categories table. (An important
part of database design is normalization, which in essence is reducing
the amount of duplication in a database.) Some people would also prefer
setting up a category_id rather than using the text category_name. This
situation would look a little different:

create table categories (
category_id serial not null unique
, category_name text not null unique
, category_description text not null unique
);

create table books_categories (
book_id integer not null
references books(book_id)
on update cascade on delete cascade
, category_id integer not null
references categories(category_id)
on update cascade on delete cascade
, unique (book_id, category_id)
);

create view books_categories_view as
select book_id, title, category_id, category_name
from books
join books_categories using (book_id)
join categories using (category_id);

There'd be similar changes to the queries as well.

Unless you're dealing with data that is naturally an array (for
example, some math constructs with array data), I'd recommend sticking
with setting up tables and join tables. Personally, I have yet to use
an array type. Relational theory (upon which SQL is based) works well
for tables (relations). I'd recommend picking up a book on database
design, especially normalization. There's a lot of information
available on the web as well.

Michael Glaesemann
grzm myrealbox com

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Steve Tucknott 2004-08-08 19:00:56 Multiple return 'columns' from postgre pl/pgsql
Previous Message Fredrik Jonson 2004-08-08 09:31:49 Re: Design question regarding arrays