Re: Research and EAV models

From: Johan Nel <johan(dot)nel(at)xsinet(dot)co(dot)za>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Research and EAV models
Date: 2009-10-24 07:37:11
Message-ID: hbuari$aj0$1@news.eternal-september.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Peter,

I agree 100% with you. EAV can be a good "middle of the road" appoach as
you suggest.

Peter Hunsberger wrote:
> My take on this, for the research world, is to not go pure EAV, but
> rather normalize by some more generic concepts within the domain. Eg.
> "measurement", or "evaluation", etc. You might ultimately end up with
> a sort of EAV model, but the "V" portion is strongly typed within the
> database and you're not trying to cast a string into 20 conventional
> data types. This still requires rigorous metadata management on the EA
> side of the EAV model, but you can tackle that in many ways.
>
>> SQL isn't the be-all and end-all of data storage. It does relational
>> stuff well, and other stuff poorly.
>
> You can build variations on EAV that are closer to a regular
> relational schema. These don't necessarily work well or poorly but
> often, at least in the research world, the middle ground is good
> enough. You are after all, talking about people who spit out MySQL
> databases at the drop of a hat....

I use a very similar approach in managing meta-data, normalize the data
that can be normalized and use EAV for the rest. Potentially eliminating
as much as possible text search, however in some scenarios it might be
necessary but an additional where on some normalized columns can help a
lot with performance.

One of my application meta-data frameworks uses only two tables to store
all meta-data about an application and have basically the following structure:

CREATE TABLE controls (
ctrl_no SERIAL PRIMARY KEY NOT NULL,
app_id varchar(30) NOT NULL,
ctrl_type varchar(30) NOT NULL,
ctrl_id varchar(30) NOT NULL,
ctrl_property text, -- This can be also hstore to add some intelligence
CONSTRAINT controls_unique UNIQUE (app_id, ctrl_type, ctrl_id));

CREATE TABLE members (
ctrlmember_no SERIAL PRIMARY KEY NOT NULL,
ctrl_no INTEGER NOT NULL REFERENCES controls(ctrl_no),
member_no INTEGER NOT NULL REFERENCES controls(ctrl_no),
member_type varchar(30) NOT NULL,
member_property text, -- This can be a hstore to add more intelligence
CONSTRAINT member_unique UNIQUE (ctrl_no, member_no));

ctrl_property is used to store meta-data based on ctrl_type.

member_property stores meta-data based on member_type and/or overriding
default ctrl_property values based on the parent ctrl_no it is associated
with.

Without this approach I would have to create more than 100 tables if I
want to fully normalize this meta-data.

Many people have indicated that I am actually duplicating many of the
catalog features of a relational database, and I agree. However, it
allows me to port this meta-data onto any user specified RDBMS without
having to worry about database specifics.

With two [recursive] queries on the above two tables I can answer most of
the fundamental questions regarding the meta-data:

1. Show me the members (parent -> child) of a specific feature.
2. Show me the owners (child -> parent) of a specific feature.

Extending the above, it allows for an easy development plan for writing a
generic application framework that not only manages the meta-data, but
also allows the same framework to run/create a user interface on the fly
by a couple of nuances on the above two queries and using the EAV
(%_property) columns to supply all the default properties and behaviour of
the application.

Changing the behavior of an application becomes primarily a database
management issue, a lot less application upgrade management in a
distributed environment.

To come back to the original message. Yes there are a place for EAV, not
only in Research but even in Business data. I have a Environmental
software scenario, and EAV on the business data provide me the edge
against my competitors. Lot less time needed to implement new features
compared to doing the normal functional decomposition and system
development life cycle.

In conclusion, I include a extract from an article by Dan Appleton
(Datamation, 1983) that my approach is based on:

“The nature of end-user software development and maintenance will change
radically over the next five years simply because 500 000 programmers will
not be able to rewrite $400 billion of existing software (which is hostage
to a seven- to 10-year life cycle). They'll be further burdened by those
new applications in the known backlog, as well as by those applications in
the hidden backlog. To solve the problem, dp (data processing) shops must
improve productivity in generating end-user software and provide end-users
with the means of generating their own software without creating
anarchy... The answer to this is a data-driven (meta-data) prototyping
approach, and companies that do not move smoothly in this direction will
either drown in their own information pollution or loose millions on
systems that are late, cost too much, and atrophy too quickly.”

Regards,

Johan Nel
Pretoria, South Africa.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message shahrzad khorrami 2009-10-24 13:32:14 is postgres a good solution for billion record data
Previous Message Scott Marlowe 2009-10-24 06:43:56 Re: auto-filling a field on insert