Re: A general database question!

From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: Jeff Davis <list-pgsql-general(at)dynworks(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: A general database question!
Date: 2002-03-22 17:07:54
Message-ID: 3C9B64EA.5010103@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<blockquote type="cite" cite="mid:200203212329(dot)PAA17841(at)mail(dot)ucsd(dot)edu">
<blockquote type="cite">
<pre wrap=""><br>It is simple so far - I'd just create an an index on both location and<br>name...<br><br></pre>
</blockquote>
<pre wrap=""><!----><br>I am not quite understanding why a multi-column index won't work. Can you <br>explain a little more?<br><br>Here is what I was thinking:<br> create table entity(<br> type int,<br> location int,<br> name varchar(100) not null,<br> );<br> create unique index my_index on entity(location,name);<br><br>It seems as though an "and" query would run quickly. It also seems as though <br>if you needed to get a few records that were one of several names, a query <br>such as the following would run quickly:<br>select * from entity where location = 5 and (name = 'joe' or name='joseph');<br><br>or are the aliases a seperate attribute? I am not clear on exactly where the <br>alias comes in. <br></pre>
</blockquote>
Well... Yes, that's the problem - aliases ARE a separate attribute. The SAME
entry could be known as both 'joe' and 'joseph', so, as far as I can see,
I have to choices to do this, as I explained earlier - either create two
tables, one having just entry id and location, and the other one, having
id, name and type (or 'nametype' if you will), liked to the first one, or,
I could have one table with multiple rows, corresponding to the same entry.<br>
<br>
The problem is that I don't  like any of these solutions :-(<br>
The first one isn't good enough because I can't create an index accross two
tables, so the search by location and name would be problematic...<br>
<br>
The second one sucks, because it duplicates all those locations, and,  because
it is not normalized.<br>
<br>
Your suggestion looks like the second of these solutions... First of all,
the table you suggest, would need another column, say, entityID, telling
which entity this name entry belongs to. This column can't even be a serial
or a primary key, because there will have to be multiple rows with the same
id (to hold aliases)... Now, suppose, a paritcular entity has 10 different
names - this would create 10 different rows in the table, all of which are
supposed to have the same id and location...<br>
First of all, I'll waste 9*(sizeof(location)) bytes per each such entity
(10 identical location values, instead of only one really needed)... And
secondly, how am I going to enforce the consistency when an entity 'moves'
(location gets changed)? All those ten rows have to be updated... In other
words, I would need to have some means of ensuring that for every given pair
of rows in that table if row1.id=row2.id then row1.location=row2.location...<br>
This calls for another complication of the schema - putting a trigger on
the table to ensure the consistency....<br>
<br>
While this all seems to be doable, please keep in mind that what we are discussing
here is just a simple example... This gets more complicated, if you take
into consideration, for example, the fact, that name and location are likely
to be not the only two attributes of an entity - there could be many more
of them... Also, consider that you have LOTS of those entitites (tens of
millions), each having, say, 20 attributes - imagine how much space would
be wasted on duplication, and how much time would be wasted executing that
trigger... :-(<br>
<br>
Does this clarify things?<br>
<br>
Thanks!<br>
<br>
Dima<br>
<br>
<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 3.5 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-03-22 17:26:50 Re: Have you seen these hackers: Uncle George
Previous Message Serkan Bektaş 2002-03-22 16:50:32 Re: pg_hba.conf errors

Browse pgsql-sql by date

  From Date Subject
Next Message David Siebert 2002-03-22 18:49:46 Re: Yet another indexing issue.
Previous Message Richard Huxton 2002-03-22 16:15:06 Re: How to return more than one variable from PgPlSQL procedure?