Re: Thus spoke SQL3 (on OO)

From: Chris <chris(at)bitmead(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thus spoke SQL3 (on OO)
Date: 2000-05-21 01:28:13
Message-ID: 39273BAD.CA8F3AF5@bitmead.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

Excellent research Peter....

> Let's therefore
> assume that the syntax should really be more something like this:
>
> CREATE TABLE <table name> <table element list> <subtable clause>
> | CREATE TABLE <table name> <table element list>
> | CREATE TABLE <table name> <subtable clause>

I agree.

> ** Column naming and ordering
<snip>
> This would make some sort of sense if the originally defined columns
> came before the inherited ones, but the way it stands it doesn't help
> a lot.

Yes, super-class first seems reasonable.

> The resolution of names is done as follows. First, a list of all
> inherited columns is created. If one column is "replicated", that is,
> more than one supertable inherited it from the same super-supertable,
> all but the first occurrence is dropped. Then the column renaming
> clauses are applied. The resulting list must not contain a duplicate
> column name.
>
> This scheme is quite different from the current PostgreSQL
> implementation, which merges columns of equal name and datatype.

In the absense of renaming, I don't think this amounts to anything
different to PostgreSQL.

> It
> has been mentioned before during OO discussions that the association
> of inherited columns by name alone should probably be dropped and
> replaced by pg_attribute.oid references. This would seem like a good
> thing to do because it would allow us to detect replicated columns
> reliably and give a chance to the column renaming option.

That would mean I guess sharing pg_attributes between different classes
in a hierarchy, which I assume doesn't happen now. But sounds good.

> "The OID value is materialized as a character string with an
> implementation-defined length and character set SQL_TEXT."
>
> ... or maybe not. :-)

How does character set affect this? It is different to data type isn't
it?

> What exactly IDENTITY is is still a bit unclear to me but it is
> definitely not the proposed identification of the table a row came
> from.

How do you know?

> * Cloning
> CREATE TABLE name (
> colname type,
> colname type,
> LIKE other_table,
> colname type,
> ...
> );

Hmm. Fairly useless feature IMO.

> ** SELECT
>
> To make a long story short: subtables are automatically examined,
> unless ONLY is specified.
>
> To make the story longer, phrasing and viewing it like this is really
> quite incorrect. Instead:
>
> "Any row of a subtable must correspond to one and only one row of each
> direct supertable. Any row of a supertable corresponds to at most one
> row of a direct subtable."

They've chosen this model to describe how things work. Unless there is
some subtlety I'm missing the model can equally be described by the
"subtables are automatically examined" model. Maybe they thought it was
easier to describe in those terms (I don't, I think it's lame), but it
shouldn't affect implementation. In particular I think implementing it
this way would be about as silly a thing I've ever heard. pgsql has it
right.

> The key word I see here is `correspondence', namely that a given row
> is always a member of both the sub- and the supertable (probably
> having more columns in the subtable obviously) and doesn't belong to
> either of them more then to the other. In other words, the row is
> conceptually shared. Then what the ONLY table reference really does is
> select all rows of a supertable that do not have any corresponding row
> in any subtable. (This is the wording the standard chose.)
> Implementation-wise this would be the easier thing to do (which is
> probably why it's done this way now),

Umm no, it is not the way it is done now (in pgsql). When ONLY is
specified (or rather when "*" is not specified in current pgsql syntax),
it just queries the base class table. It doesn't check "rows of a super
table that do not have any corresponding row in any subtable". The
subtable just doesn't come into it.

If it were implemented that way, then a complex inheritance hierarchy
could result in a join across a dozen tables. Avoiding that is well
worth having to examine several tables in a query. (Most of the time
anyway). Put another way, a UNION is much cheaper than a join.

> ** Insert, Update, Delete
>
> These commands have no special notion of inheritance. Since all rows
> are effectively shared between sub- and supertables you cannot update
> them in one of them "only" without some copy-on-write concept.

ONLY in the case of update and insert would refer to any row which is
does not correspond to any row in a sub-table. Boy I hate talking in
terms of this model, because it's really lame.

I suspect even in this lame model, it doesn't imply copy-on-write. It
just means delete cascades to sub-class tables, delete only doesn't need
to and update ignores sub-classes and update only updates only when it
doesn't correspond to sub-class tuples.

Back to the real world however and it means delete ONLY doesn't go off
searching subclasses and update ONLY doesn't go off searching
sub-classes.

Of course insert isn't affected for the same reason C++ constructors are
not polymorphic.

> Of
> course some rows in a supertable may have no corresponding rows in any
> subtable, but that's nothing the row knows about or cares about. In
> sophisticated inheritance hierarchies, rows and parts of rows may be
> shared in very involved ways, so I foresee some issues with Update
> Only.

Which is why it is totally insane to implement it that way. The way
postgres implements it now is very simple and works.

> (This sounds stranger than it really is: There is no requirement that
> the `corresponding row' is physically stored at the supertable, it is
> only required that it effectively exists there as well, which is
> satisfied if SELECT retrieves it by default.
> In some sense the
> apparently `favoured' storage model here is that all the inherited
> attributes and their values are stored in the supertable heap and only
> the originally defined attributes of subtables are in the subtable
> heap. This method favours the SELECT semantics on supertables because
> it doesn't have to worry about subclasses at all. But queries on
> subtables effectively become joins.)

Boy. I think we should look to how other people have implemented object
models rather than how SQL3 describes the concept. This sounds like a
nightmare.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mikheev, Vadim 2000-05-21 01:43:37 Berkeley DB...
Previous Message Tom Lane 2000-05-21 01:13:49 Re: More Performance