Re: generic modelling of data models; enforcing constraints dynamically...

From: InterRob <rob(dot)marjot(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: generic modelling of data models; enforcing constraints dynamically...
Date: 2009-09-27 18:26:27
Message-ID: 671e36b0909271126h11ea06fcm2b5bcb540d5ed1e1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear David, dear all,
I very well understand what you are saying... However, the solution won't be
found in the direction you are suggesting: the system I am designing will be
used by archaeologists, involved in archaeological research (fieldwork).
Their research strategy (and with it their methodology and techniques) may
vary during research, depending on their findings and understanding of the
past that is reconstructed on-site... Along with these methodologies en
techniques, differing data-models may be needed to introduced...

Relationships between these models may be formalised (that's what I (will)
put stakeholders together for); in fact this is what I try to model --
trying to develop a system that will centralize and version the data
gathered. On a supra-project level, in fact.

Meanwhile, I made some progress; in another mail I sent to this same list, I
described the technological challenge at hand as follows:
-----
What I am trying to do is: building views on a base table, extended by one
or more columns, extracted (hence the naming of the function
"deserialize()") from a SINGLE column (XML) **that is in this same base
table** (see below). Instructions for deserialization (that is: which
'fields' to look for) reside in some other table. There are MULTIPLE base
tables, they basically look like this:

[table definition:]
BASETABLE(ID INT, model TEXT, field1 some_type, field2 some_type, ... fieldN
some_type, serialized_data XML)

So, I wish to define multiple VIEWs based on a BASETABLE; one for each
"model" (as stated in the above table definition: "model" is a property for
each row). This QUERY would look like this (producing a VIEW for "MODEL1";
the query below in invalid, unfortunately):

>> SELECT base_t.*, deserialized.* FROM "BASETABLE" base_t,
deserialize('MODEL1', base_t) as deserialized(fieldX some_type, fieldY
some_type) WHERE base_t.model = 'MODEL1';

I have no problem with the requirement to supply the table type in the
query; infact this is logical. Still, this query is impossible, obviously,
because "base_t" as a target is not known in the context of the FROM-clause,
where I whish to use it in calling "deserialize(...)". Ofcourse, I could
write a "deserialize()" function for each base table (e.g.
"deserialize_base1(...)") but I wish it to perform it's action on only rows
that will actually be part of the result set; thus I want the WHERE-clause
to apply to the function's seq scan álso. When provided, I whish to
incorporated the user's WHERE-clause as well; this is done by the PostgreSQL
RULE system...

Alternatively, the VIEW could be defined by the following query:
>> SELECT base_t.*, (deserialize('MODEL1', base_t) as temp(fieldX some_type,
field_Y some_type)).* FROM "BASETABLE" base_t WHERE base_t.model = 'MODEL1';

This approach does not work either: deserialize(...) will return its set of
fields as ONE field (comma separated, circumfixed by brackets); expressions
within a SELECT-list seem to be only allowed to result in ONE column, except
from the * shorthand...

** So, the question is: how can i feed my "deserialize()" function with a
record subset (e.g. filter by a WHERE-clause) of the BASETABLE, while still
returning a record?!? **

I tried the following approach also:
>> SELECT (SELECT fieldX FROM deserialize(base_t) deserialized(fieldX
some_type, fieldY some_type)) "fieldX", (SELECT fieldY FROM
deserialize(base_t) deserialized(fieldX some_type, fieldY some_type))
"fieldY" FROM "BASETABLE" table_t WHERE model= 'MODEL1';

Which infact worked, but caused the function to get invoked TWICE FOR EACH
ROW (even when marked IMMUTABLE STRICT; or did I have to flush cached query
plans in psql?).

Another approach would be to put all key/value pairs into a separate table
(as one would do when implementing a EAV-model within a RDBMS) which is then
to be joined (and joined again... and possibly again (!); in case of
MULTIPLE additional rows -- depending on the definition of the VIEW) onto
the BASETABLE, rather than to deserialize from XML which is stored within
the same record... How does this approach then actually translate in terms
of table scans? Will they be limited by the filter on the BASETABLE, as the
available values to join on will be limited? At any rate: this approach will
be more difficult to implement / maintain in case of EDITABLE VIEWS
(inserts, update, delete)...

Hope any of you has some useful thoughts on this... It appears to me
updating the additional (virtual) fields in the BASETABLE is much easier:
the "serialize()"-function can be fed by a list of key/value pairs,
producing some XML that can be stored in the xml field of "serialized_data",
part of this same base table...
All this needs to be implemented fully in the database back-end; client
application will not know they are talking to VIEWS rather than tables...
Thus: the hosted database must simulate to provide various tables, whereas
these are in fact stored in a limited number of base tables.

----

Thanks in advance, you guys out there!

Rob

2009/9/26 David Fetter <david(at)fetter(dot)org>

> On Thu, Sep 24, 2009 at 06:28:28PM +0200, InterRob wrote:
> > Dear List,
> > I am trying to implement the following:
> >
> > In a database I wish to implement a GENERIC datamodel, thus on a
> > meta-level.
>
> That's not a very bright idea, even though it seems so when you first
> think of it.
>
> Relational database management explicitly trades flexibility for size
> and speed. You won't be able to constrain the things you think you'll
> be constraining, and the query complexity will go up like O(n!).
>
> Instead of going down this dead-end road, get the stakeholders
> together, try a few prototypes of your schema, get them together
> again, etc.
>
> Cheers,
> David.
> --
> David Fetter <david(at)fetter(dot)org> http://fetter.org/
> Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
> Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlo Camerino 2009-09-27 18:27:59 Questions On Tablespace
Previous Message Martin Gainty 2009-09-27 18:22:35 CREATE LANGUAGE workaround