From: | "mark overmeer" <markovermeer(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | schema design question |
Date: | 2007-08-19 13:19:52 |
Message-ID: | ea41f7570708190619u6daf7cep96430d20566d6159@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
Maybe not completely the wright place to ask but... I have this schema
design question (db is postgres of course). I have a couple of classes with
attributes. The only goal is to search the object that I want to find (which
is stored on the harddrive).
I have hundreds of classes that are similar but not the same. They all have
attributes/properties (type is probably String), e.g. (in pseudo code):
class A_version_1 {
attribute1, attribute2, attribute3, ..., attributeN
}
class A_version_2 {
attribute1, attribute3, ..., attributeN, attributeN+1, attributeN+2
}
class B_version_1 {
attribute3, attribute4, attribute7, attributeN+3, ..., attributeN+M
}
Class A will have attributes from class B, class B will have attributes from
class C and so on. My initial thought was to use the (sometimes dreaded) EAV
model: class_id, object_id, attribute_id and attribute_value. In this way I
can make queries like:
SELECT CLASS_ID,
OBJECT_ID
FROM EAV_TABLE EAV
WHERE EAV.ATTRIBUTE_ID = X
AND EAV.ATTRIBUTE_VALUE = 'searchstring'
AND EXISTS (SELECT OBJECT_ID
FROM EAV_TABLE EAV2
WHERE EAV.OBJECT_ID = EAV2.OBJECT_ID
AND EAV.CLASS_ID = EAV2.CLASS_ID
AND EAV2.ATTRIBUTE_ID = Y
AND EAV2.ATTRIBUTE_VALUE = 'searchstring2')
Results from this query could be entities from multiple classes!
The alternative is, as many people say: make a proper table for each class
which would lead to hundreds of unions. Is that good/performant? I thought
it would not... To put all attributes of all classes (as columns) in one
table is impossible. The number of total attributes should be in the
thousands.
A third alternative I came up with is the entity/value schema design where
each attribute would have its own table. A query would look like this:
SELECT CLASS_ID,
OBJECT_ID
FROM EV_X EAV
WHERE EAV.ATTRIBUTE_VALUE = 'searchstring'
AND EXISTS (SELECT OBJECT_ID
FROM EV_Y EAV2
WHERE EAV.OBJECT_ID = EAV2.OBJECT_ID
AND EAV.CLASS_ID = EAV2.CLASS_ID
AND EAV2.ATTRIBUTE_VALUE = 'searchstring2')
Which would be a nice way to partition the otherwise large table (but there
would be thousands of smaller tables).
The app I'm writing has to scale to about 1 billion attributes/value-pairs
in total. A normal search query would imply about 5 search terms (but there
could be 20). Any suggestions/remarks (I think the EXISTS should be replaced
by an IN, something else)? Did anyone implement such a search method (or did
they decide to make a different design)? Did it work/scale?
Thanks in advance,
Mark O.
From | Date | Subject | |
---|---|---|---|
Next Message | Niklas Saers | 2007-08-19 14:01:55 | Looking for database hosting |
Previous Message | Kevin Grittner | 2007-08-19 01:11:59 | Re: Help optimize view |