From: | Edward Stanley <Edward(dot)Stanley(at)mcs(dot)vuw(dot)ac(dot)nz> |
---|---|
To: | David Fetter <david(at)fetter(dot)org> |
Cc: | Richard Huxton <dev(at)archonet(dot)com>, Sean Utt <sean(at)strateja(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: My honours project - databases using dynamically attached entity-properties |
Date: | 2007-03-13 23:07:45 |
Message-ID: | 200703141207.45615.Edward.Stanley@mcs.vuw.ac.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 14 Mar 2007, David Fetter wrote:
> On Tue, Mar 13, 2007 at 05:54:34PM +0000, Richard Huxton wrote:
> > David Fetter wrote:
> > >On Tue, Mar 13, 2007 at 02:21:37PM +0000, Richard Huxton wrote:
> > >>David Fetter wrote:
> > >>>On Tue, Mar 13, 2007 at 09:31:45AM +0000, Richard Huxton wrote:
> > >>>>* Another good example is the "questionnaire".
> > >>>
> > >>>With all due respect, this is a solved problem *without EAV or
> > >>>run-time DDL*. The URL below has one excellent approach to this.
> > >>>
> > >>><http://www.varlena.com/GeneralBits/110.php>
> > >>
> > >>Which broadly speaking was the solution I used for my
> > >>questionnaire, except I had a restricted set of types so basically
> > >>just coerced them to text and side-stepped the inheritance issue.
> > >>To the extent that it's dynamic, it's still just EAV though.
> > >
> > >That's precisely the difference between the above solution and
> > >yours, and it's the difference between a good design and one that
> > >will come up and bit you on the as^Hnkle.
> >
> > It's still basically EAV (either approach). The key fault with EAV
> > is that the tables have no semantic meaning - answer_int contains
> > number of oranges, days since birth and the price of a tube ticket
> > in pennies.
>
> Stuffing all of those into an answer_int is *precisely* what the end
> user must not do. That's pilot error.
>
> > Now, with a questionnaire that might not matter because everything
> > is an "answer" and you're not necessarily going to do much more than
> > count/aggregate it.
>
> See above.
>
> > >>It doesn't remove the need for run-time DDL if you allow users to
> > >>add their own questions.
> > >
> > >Sure it does. When a user, who should be talking with you, wants
> > >to ask a new kind of question, that's the start of a discussion
> > >about what new kind(s) of questions would be generally applicable
> > >in the questionnaire schema. Then, when you come to an agreement,
> > >you roll it into the new schema, and the whole system gets an
> > >improvement.
> >
> > Fine, but if you're not letting the user extend the system, then
> > it's not really addressing Edward's original posting, is it?
>
> It's my contention that Edward's original idea is ill-posed. SQL is
> just fine for doing this kind of thing, and it's *not that hard*.
>
> > If the user's talking to me, I might as well just write the DDL
> > myself - it's the talk that'll take the time, not writing a dozen
> > lines of SQL.
>
> It's the talk that's the important part. Machines are really bad at
> seeing the broader picture. In the attempt to "save" a few minutes'
> discussion, he's trying to borrow that time from a system asked to do
> things that computers are inherently bad at doing, and every end user
> will pay that time back at a very high rate of interest. This is
> precisely the kind of false economy that so plagues software
> development and maintenance these days.
>
> > The interesting part of the problem (from a Comp-Sci point of view)
> > is precisely in automating part of that discussion. It's providing
> > an abstraction so that you don't end up with a mass of attributes
> > while still providing freedom to the user.
>
> This freedom and efficiency you're talking about is better supplied,
> IMHO, by putting a standard DDL for questionnaires up on a pgfoundry
> or an SF.net. That way, improvements to the DDL get spread all over
> the world, and a very large amount of wheel reinvention gets avoided.
> Reusable components are a big chunk of both freedom and efficiency. :)
>
> Cheers,
> D
Maybe I should rethink the problem a bit - from the very brief initial
research I've done, it seems EAV schemas have two common uses:
1) When new attributes have to be created on-the-fly
2) When the number of possible properties for an entity greatly (orders of
magnitude) exceeds the number of properties any one entity is likely to have.
I'm not sure about solving the first problem - there seems to be a lot of
debate around this. I can see reasons for and against allowing this. However
I think the second is a very real problem. One such example is a patient
record system.
For each patient we have a table of common data (dob, sex, height, weight etc)
but as well as this a patient can present with many symptoms. This might be a
table of 40,000 possible symptoms.
Lets say we want to run a query on these symptoms (using a boolean expression)
to return the patient records which match the query string on the symptoms.
(This turns out to be a very similar problem to the 'tags' example I first
presented) - assume a similar schema. With more than a couple of symptoms and
a complex tree, the resulting SQL can span pages.
When I first started thinking about this project I believed the two problems
essentially to be the same class of problem, but this may not be the case.
What do people think?
- Also, thanks everyone for your input thus far. It has been very valuable.
Eddie Stanley
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2007-03-14 04:13:04 | Re: Log levels for checkpoint/bgwriter monitoring |
Previous Message | Tom Lane | 2007-03-13 21:22:04 | Re: Bug in VACUUM FULL ? |