Re: CREATE TABLE with REFERENCE

From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: Jonathan Bartlett <johnnyb(at)eskimo(dot)com>
Cc: "kay-uwe(dot)genz" <kug1977(at)web(dot)de>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: CREATE TABLE with REFERENCE
Date: 2003-07-29 18:26:45
Message-ID: 3F26BC65.8040907@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jonathan Bartlett wrote:

>NOTE - after writing all this, I did think of a possible solution, but I'm
>not sure if PG can handle it. If I made a table called "object" with one
>column, the object_id, and then had EVERY table inherit from this table.
>Then, I could have my constraints set up against this master table.
>
Ummmm... yeah, that would be a great way to do that *if* the
"inheritance" emulation in sql worked that way ...
Unfortunately, it does not :-(

That's exactly the reason I called it 'half-baked' in one of the earlier
messages... When you inherit table B from table A, B will have all the
columns A has, but setting up an FK on table C against A won't work,
because the FK will check for the key to be present in A *itself*, not
in A or any of its inherited children, as one would expect...

What you *could* do, though, if you really wanted is to kinda emulate
that inheritance on your own, with something like:

create table A
(
id serial primary key,
type text not null
);
create unique index a_idx on A (id,type);

create table B1
(
id int primary key,
type text not null default 'b';
stuff text
foreign key (id,type) references A(id,type) on delete cascade on
update cascade deferrable initially deferred
);

create table B2
(
id int primary key,
type text not null default 'b1',
stuff text,
foreign key (id,type) references A(id,type) on delete cascade on
update cascade deferrable initially deferred
);

create table C
(
id int not null references A on delete cascade on update cascade
initially deferred,
note text
);

... now you can insert notes for either B1 or B2 into C

This kinda works, but just seems like too much trouble to go through -
it would be nice if the 'inheritance' support could do something like
that for you automatically, but, since it doesn't, I'd rather stick with
the old good 'plain sql' solution - get rid of A, merge B1 and B2
together (just B), and make C reference B.

>
>
>
>>If you can combine your queries with a union, your table layouts must be
>>very similar if not identical.
>>Why not put everything into the same table then, and just have an FK
>>between that table and the notes?
>>
>>
>
>No, there are _parts_ that are very similar. I don't know where this
>whole "table-combining" kick came from, but I've usually found that it
>ends in a disaster.
>
Then you need to normalize your schema first - extract those "very
similar" parts, and put them into the single table, and make your notes
table reference that one, then create specialized table(s) on the side,
that will contain those columns that are different between the objects,
and make them reference your 'master' table too.

>
>
>
>>What do you mean by soft-delete?
>>Leaving orphaned notes behind? Bad idea...
>>
>>
>
>I have a boolean flag that says "active". I don't ever actually purge
>data. There are times when it is useful to come back in and look at
>what's been "deleted". From the user-interrface standpoint it has been
>deleted, but we can still go back in and retrieve records after they are
>gone.
>
>
Well... That's the 'GUI delete'...
Sooner or later you will want to do the 'real' delete - either to remove
an object that just was created by mistake, and should not be there at
all, or simply to clean up your database, and remove the stuff that has
been sitting there for years, being 'inactive'

>
>
>>How can you be sure? What if you get hit by a bus, and another admin,
>>who is not as knowledgeable as you are goes to the db, and runs an
>>insert with plain sql?
>>
>>
>
>They should read the documentation.
>
I am sure, they will... *after* they screw up the database, and begin
wonderring what's wrong with it :-)

> If they want to run an insert with
>SQL, they should at least be smart about it :) If someone's too stupid to
>read documentation, they are going to screw up the database no matter
>what.
>
Not really... If your database schema is thoughtfully designed, one has
to be *really* smart to be able to screw something up.

>>Also, even if it was indeed only possible because of a broken code, you
>>are not saying that your code is bug-free, are you?
>>
>>
>
>If it's not bug-free, having a good database schema isn't going to save
>me.
>
No, it is not going to save *you*, but it *is* going to save your *data*

>
>
>
>>Even, if it was, postgres is not, and your filesystem and OS are not
>>either. If the database crashes in the middle of your insert, you'll end
>>up having inconsistent data.
>>
>>
>
>Transactions will handle that one.
>
>
No, they won't, unless you actually use them :-)

>
>
>>If I understand your statement ('what would the harm be') correctly, and
>>you just don't care about your data consistency, then, I guess, you are
>>rigfht - you don't need any constraints... but, in that case, I don't
>>
>>
>
>You are missing the point. There are many things that must be balanced:
>
> * Ease of programming / speed of development
>
> * Correctness of code
>
> * Consistency of data
>
>I can get much further with my system on all three points than I can with
>yours. With mine, I have a single, tested function that I can use
>anywhere. This hits off the first two. While this prevents me from
>having a database-checked #3, it still gives me consistent data because we
>don't do hard-deletes and we have tested and verified #2. If you can show
>me how to get te ease-of-programming and correctness with your approach,
>I'd be happy to use it.
>
Easy - merge your tables together the way I described above. Your ease
of programming will not be affected - your obj_html() function will
still work, and won't even require any changes. *Moreover* - if you ever
need to add more object types (or modify the existing ones), you will be
able to do that with much less effort, then you would need to invest
now, with your current schema (in most of the cases, you'll be able to
do those kinds of enhancements without even touching your schema *at all*).

>
>
>
>I do care about data consistency and concurrent access. And my data is
>consistent. I use many of the RDBMS features of postgres such as views /
>subselects, transactions, triggers, etc. The fact that I have instances
>where they fall short and I have to do it in the application just shows
>that our tools are incomplete, not that I don't care about data
>
No, it does not show that they are incomplete, it shows that you are not
using them the right way :-)

>consistency. In fact, I have trouble thinking how anyone could have
>sensical data with everything joined together into one uber-table, which
>you seem to be advocating.
>
Not *one* table. I never advocated that. It is perfectly normal to split
your data into different tables *vertically* (i.e. things that do not
have any intersection between their data, should go into different
tables), but it very rarely (if at all) makes any sense to split it
*horizontally* (so that identical columns sit in different tables, just
because your application interprets them differently) - the 'early'
indication of the problems caused by the latter approach is the
temptation to create 'multitable references' - this can always easily be
avoided by eliminating those 'horizontal divisions', and doing so will
necessarily (and immediately) benefit *all three* of your "balancing
points", that you mentioned earlier.

>We ahve a standard creation process. If someone is an idiot, that can't
>be helped.
>
Not 'helped' entirely, but the amount of damage *can* and *should* be
limited.
It is an unfortunate reality of life - not all people are smart (even,
some of the ones with access to the database are not).
It is definitely a bad programming practice to assume otherwise.

>What if someone pulls out several drives from a running RAID?
>
>What if someone inserts bad records (even if they are inconsistent)? What
>if someone steals the server?
>
>
The first oen and the last one are hardware related. There are ways to
deal with those too (a good start will be putting a lock on the server
room for example), but they are beyond the current topic.

As for inserting bad records - that's *exactly* what properly designed
constraints are supposed to prevent.

>If you wind up with problems, you have to fix them. The fact that
>problems are possible does not make them likely. There is much more
>likelihood of a programmer screwing up writing their 14th copy of
>note_html and the associated table structure than there is that someone
>pulls a number out of their butt.
>
Sure. There should not be copies. Code duplication is a software-world
analog of poorly designed schema problem in the database world.
Both are bad, and should be avoided. I never suggested that you
eliminate the latter at the expense of the former.
Your note_html() looks great to me, and I agree, that it should be kept
that way... it's the database side that look s problematic.

> Where would they get the number if not
>from the sequence? "Oooh, I know, I'll insert 10 as the primary key, jsut
>for kicks! I'll even hardcode that into the app!" Not a very likely
>scenario.
>
Your programmers must be really smart :-)
Are you saying that you have never seen a person writing a piece of sql
like:
insert into mytable (id, data) select max(id) + 1 from mytable, 'mydata'
???

If so, you must be really lucky :-)

>
>
>
>>What if you load your database from a backup and forget to reinit the
>>sequence?
>>
>>
>
>If it's being loaded from backup, the schema reinits the sequence on
>schema-load.
>
>
Exactly. But 'copy from ' does *not* - so, after you have loaded, your
sequnce next_val () will return 1.

>
>
>>If that was the case, you would not be able to combine them with a
>>union, as you said you do...
>>
>>
>
>No, I said that there were certain instances that this happens. Not that
>it is the norm. In the cases where I do combine them with a union, I am
>only combining the similar parts, not the whole shebang.
>
And that's what you should do - similar (common) parts go to the same
table, to which your notes are linked, the differences go to other
tables, and get linked to your master table the same way.
You can then create a bunch of views to make it *look* exactly the same
way as it is now to the programmer, so that your application code will
not be affected at all, but your schema will be nicely normalized, and
you'll be able to set up constraints on it properly.

>
>
>>>Splitting the notes table would be pointless. Why do it?
>>>
>>>
>>>
>>Because that would make it possible to use the constraints.
>>
>>
>
>Maybe the constraint system should be extended to allow it to check
>across multiple tables.
>
Maybe :-)
That was my last suggestion in the original message - you can always
write a trigger to do that...
It would be the worst of the three other possibilities we've considered,
but still better, then what you have now.

>The OID type/column is ideal for this.
>
No, it isn't. Not all tables have oids. The ones that do, do not
guarantee, that they will be unique.
Even if you do that by hand (create unique index on table(oid)), there
is still no way to guarantee their uniqueness across entire database.

> Sadly, it
>is only 32 bits, and they are not storing an OID/table lookup like I think
>they should. That would enable really powerful database applications that
>are way too much work to make today.
>
I don't know what you are talking about :-)
'too much work' ... come on.
If you think of a solution, that is too much work, it does not mean,
that it is the *only* possibility - just think again :-)

> With that kind of thing, you could
>even do record merges with automatic database support.
>
What do you mean by "record merges"?
Any meaning of that phrase I can imagine can be easily done with the
currently supported database features... so, you must mean something
different by that, I assume...

>
>
>
>>Also, if one adopts your earlier point, it can also be argued, that it
>>is equally 'nuts' to have notes about Payments stored together with
>>notes about Sponsors.
>>Those notes have just as much to do with each other as the objects they
>>annotate. :-)
>>
>>
>
>But they are all "notes" on "objects".
>
Sure... and all of your payments, sponsors, and whatever else you have
are 'objects', arent' they?
I did not say that your notes have nothing to do with each other - I
only said that they have only as much to do with each other as your
different kinds of objects do. If putting all the objects (or at least,
some parts of them) into the same table doesn't make sense to you, I
don't see why putting the notes together should.
And the other way around - if treating notes in a generic way does make
sense to you, I don't see any reason why doing the same for the
'objects' should not.

>
>
>
>>If you insist that Payments must be separate from Sponsors, the same
>>exact argument should be applied to their respective notes
>>
>>
>
>Why? All of the note's attributes function exactly the same, while none
>of Payments and Sponsors function the same. Again, you have yet to show
>how you can make a mechanism that is as easily extensible as mine is.
>
I did - just put the damn objects into the same table :-)
It is not just "as easily" extensible as yours, but *much easier*
extensible rather

> The
>fact that database constraints aren't available to support it is a fault
>of the database, not mine.
>
Your "fault" is that they *are* available, but you just refuse to use
them :-)

>Hmmm... changing table names, having to go in and recreate an exact copy
>of the schema every time that is added.
>
I don't know what you are talking about... What table names? What copy
if schema????
You lost me completely.
I *never* suggested anything remotely like that.

>
>
>
>>You want that same one line of code, working against the properly
>>designed and normalized sql schema, that lets you rely on the database
>>top ensure your data consistency and access efficiency.
>>
>>
>
>Yes, but to set it up requires a full database change.
>
Yep. It does. No argument here.
Frankly, I am not trying to convience you that you should do that
database change now (although, if I were you, I would *certainly* do it
right away, before I get into any more trouble - changing - what 5 - 10
tables? - doesn't look like *that* much of a deal to me, compared to the
benefit you'd by from it).
The discussion was about whether or not your schema was *initially*
created in a good way.

Whether or not to change it now is another question - as I said, I
certainly would, but this is your decision entirely, I am not going to
try to talk you into it :-)

>>Not at all. If you give up your idea about splitting your payments from
>>your sponsors for example, you won't need to modify your schema *at all*
>>if you need to add another object type, or another kind of note, or
>>whatever - all you'd need to do would be to implement the new
>>application logic in your application, where it belongs, and be done
>>with it. No need to even touch your schema at all.
>>
>>
>
>I don't see why you want to make my entire database into one giant table.
>Why bother with tables at all in that case?
>
Not one table - see above.
And I don't understand your second question - "why bother with
tables"... what's the alternative?

>
>First of all, inheritance wouldn't work in case of the notes, as it is a
>one-to-many relationship.
>
That doesn't matter (it indeed won't work 'out of the box' - see the
beginning of this email - but for totally different reason)

>
>No, it's not. At least with PostgreSQL, you can only have ONE line of
>inheritance (at least as far as I am aware - please correct me if I'm
>wrong). In that case, all of these would have to have a base class of
>"notes".
>
No. They would have a base class of "Object" (or whatever), and the
'notes' would be linked to the Object.
This would in fact, be a *beatiful* solution... it's a shame really that
it doesn't work.
I am wonderring if what postgres does with those inherited FK
constraints is specified by the standard, or if it is just an
implementation feature, that can be improved...

>Of course, if I wanted to add other similar features, I would be
>unable to. For example, if I wanted to add a "changelog" table, and be
>able to attach last_modified, creator, etc to certain tables, I could not
>do so without adding them to all of them.
>
No, you would be perfectly able to add "features" to base class as much
as you want.
That's the beauty of the inheritance.
The "one-to-one" features can just be added as columns to the base table
(you can actually do that with even the current half-baked inheritance
implementation, you can even do that without any inheritance at all -
if all of your common features sit in the same table, like I suggest,
you can just keep adding columns to that table as you like). The
one-to-many ones can go to additional tables, and get linked to the
object id the same way your notes do.

>
>The approach I have let's me pick and choose any additional feature I want
>to add to any record.
>
... sure. Except if you wanted to add a new feature to *all* of the
objects...
Then you would either have to use your "notes hack", and create a
separate table with broken links, or you would have to go and modify
each and every of your "object" tables separately.
That's another reason for you to extract the common features into a
"supertable" - a change like this would then become a breeze.

>
>
>
>>>Again, this would require modifying and testing that trigger every time I
>>>want to add a new thing to take notes on.
>>>
>>>
>>>
>>Sure, if you implement in such way. But not if you give it some thought
>>in advance, and come up with an implementation that would be generic
>>enough not to care about your application-specific differences between
>>sponsors and payments :-)
>>
>>
>
>Actually, they are database-specific.
>
>
No, they are not :-)
The database doesn't have any notion of payments, sponsors, notes etc.
All this stuff is application specific.
Database specific are tables, columns and constraints. Everything else
is application logic, that belongs to the application.

Dima

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jonathan Bartlett 2003-07-29 19:00:49 Re: CREATE TABLE with REFERENCE
Previous Message Paul Thomas 2003-07-29 18:24:56 Re: Basic questions before start