Re: [HACKERS] Fwd: Joins and links

From: Clark Evans <clark(dot)evans(at)manhattanproject(dot)com>
To: Leon <leon(at)udmnet(dot)ru>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Fwd: Joins and links
Date: 1999-07-06 13:31:05
Message-ID: 37820519.9938D467@manhattanproject.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Leon wrote:
> No one is talking about abolishing any standard SQL feature. After
> you carefully verified design you can hard-code links to speedup
> access. Before that has happened the usual SQL will do.

Leon,

Interesting idea. You are re-introducing some of the
hierarchical database ideas, is this right? Here is
what I'm receiving, could you correct me if I'm
mis-understanding? (much of this you did not say...)

- - - - - -

In current SQL implementations, if updates are done on a
tuple being modified, referencing tables are not usually
identified or checked, let alone updated. Then, when a
query is requested, the database figures out how referenced
data can be retrieved, and does this at the time of the query.

In this proposal, in addition to carrying a primary key
for a referenced table, tuples in the referencing table
will also have a place to record the physical address
of each referenced tuple. In this way, referenced
data is easily retrieved during a query, since the
physical address of the referenced information is
stored in the referant.

For example, lets take the following schema

ORDER (ORDER_ID, ... );
PRODUCT (PRODUCT_ID, NAME, ... );
ORDER_LINE (ORDER_ID,PRODUCT_ID, ... );

In the current cases, changes to the PRODUCT table,
let's say a changed name, do not result in an update
of the ORDER_LINE tuples which reference the product
tuple being changed.

In this proposal, a few hidden field (ID/REF) would be added:

ORDER ( LEON_ID, ORDER_ID, ... );
PRODUCT ( LEON_ID, PRODUCT_ID, NAME, ... );
ORDER_LINE ( LEON_ID, ORDER_ID, PRODUCT_ID, ... , PRODUCT_LEON_REF );

Where the ORDER_LINE table would have a reference to the
physical LEON_ID of the tuple being referenced by PRODUCT_ID.

Then, an update of the PRODUCT table would result in a cascading
update of all referencing tables, including ORDER_LINE to
change the PRODUCT_LEON_REF from its previous value to the
update value. The LEON_ID and LEON_REF are internal implementation
fields and not available though SQL.

SUMMARY,

With this method, query speed is drastically improved since
the "join" logic is performed once during insert, instead
of many times during select.

This method should work well, when the referencing table
changes relatively infrequently. Thus people, products,
and other relatively static "reference" information is
a key canidate for this 'indexing' technique.

This technique should not be used if the frequency of
updates exceed the frequency of select statements.

- - - - - - -

Overall, I think it is a good idea. I frequently do weaker
versions all the time that I call "field cashing", where
the NAME field of infrequently chaging tuples are frequently
accessed. In this case, one may put PRODUCT_NAME in the
ORDER_LINE table and put a trigger on PRODUCT to cascade
update of NAME to the ORDER_LINE.PRODUCT_NAME table.
I tend to make monsters like this a nightly process, since
product name changes need not be immediate (they are rare,
and thus not frequent, and thus, not usually immediate).
This allows the cascade update to run at night when
things are alot less stressful on the database.

Is this in-line with what you are saying?

I suggest borrowing an XML term for the idea, GROVE.
In XML, a GROVE is a tree built from XML/SGML/notations.
In this case, you can think of frequently joined
information as cutting deep into the landscape, thus
the more the query is done, the more of a chance that
the UPDATE/SELECT ratio wil be small, and thus, the
greater chance that the hard wired physical address
is cashed in the table. The reason I like the
name, is that it defines a common pathway that is
easy, without preventing the efficiency of uncommon
paths (where updates >> select ).

Hmm. I'm just worrying about the CASCADE nature
of the beast. On the extreme that I was writing
about earlier, a prototype OO dbms that I was
looking at about 6 years ago (god knows what the
name is), they did *everything* this way. And
GOD it was slow... especially since it cascaded
when frequency of updates far exceed the frequency
of selects.

Thoughts?

Best,

Clark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Mount 1999-07-06 13:41:30 RE: [HACKERS] CVS, Java etc
Previous Message Leon 1999-07-06 12:29:17 Re: [HACKERS] Fwd: Joins and links