Quick Links

Re: Dependency graph of all tuples relied upon in a query answer

From:	Randall Lucas <rlucas(at)tercent(dot)com>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Randall Lucas <rlucas(at)tercent(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Dependency graph of all tuples relied upon in a query answer
Date:	2006-09-01 01:54:46
Message-ID:	20060901015446.GB17085@ontology.tercent.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote:
> On 8/30/06, Randall Lucas <rlucas(at)tercent(dot)com> wrote:
>
> >An inverted way of thinking about the problem is the notion of getting a
> >source document (say, a "company registration form") and parsing and
> >storing it in a normalized format. How do you "tag" the origin of each
> >and every row in every table that resulted from that source document?
>
> your form should have a code, either entered by the customer or by the
> preparer who enters it into the database, which becomes the key that
> identifies the registration document. Put that key into other tables.

Yes -- I have flirted with this idea for the case of importing a set of
facts from a single source. But where it breaks down is in being able
to do the reverse -- ex post facto, to certify the results of a query
as being accurate, and thereby attesting to the underlying facts.

> be careful, you are flirting with EAV thinking. I think EAV designs
> are terrible.

(Though religion is not at issue here, I am pretty devoutly relational.
I just want a good way to perform audits and other functions on my
relations.)

> > It seems to me that the elegant way to do this would be to get the
> >entire graph of dependencies for not only all tables that reference the
> >entity, but only those rows within those tables that refer specifically
> >to my entity.
> >
> >The query engine should have a pretty good idea about which tables and
> >which rows would actually be used in forming the responses to a given
> >query. How can I get this information? Or am I off the deep end (-ency
> >graph) with this one?
>
> I am not sure where you are going with this. Maybe you should mock up
> some simple tables and repost your question.

Imagine that I am the SEC. I look for cases of insider trading, and
when I find a case, I have to go back and double-check all the pieces
of evidence I am using against them.

create table company (
id serial primary key,
name text not null
);

create table officer (
id serial primary key,
company_id int not null references company(id) on delete cascade,
name text not null,
title text not null
);

create table insider_trade (
id serial primary key,
officer_id int not null references officer(id) on delete cascade,
shares_traded numeric,
share_price numeric,
trade_date timestamptz
);

insert into company (name) values ('goodco');
insert into company (name) values ('badco');
insert into officer (company_id, name, title) values (1, 'johnny b. good', 'ceo');
insert into officer (company_id, name, title) values (1, 'mother teresa', 'saint');
insert into officer (company_id, name, title) values (2, 'leroy brown', 'ceo');
insert into insider_trade (officer_id, shares_traded, share_price, trade_date) values
(3, '50000', '6.66', '2006-07-04 1:23 PM PST');

Now, my database looks like this:

select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+--------+----+------------+----------------+-------+----+------------+---------------+-------------+------------------------
1 | goodco | 1 | 1 | johnny b. good | ceo | | | | |
1 | goodco | 2 | 1 | mother teresa | saint | | | | |
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(3 rows)

If I want to know whom to investigate, I might do a query like this:

select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id where insider_trade.id is not null;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+-------+----+------------+-------------+-------+----+------------+---------------+-------------+------------------------
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(1 row)

Now that I have this query, in order to make my case, I need to "sign
off" on all of the individual data that went into it. I would like to
do something like:

(I am aware that, at least theoretically, my query is just as reliant
upon the tuples that it *didn't* show, since they were acted upon by
the where clause.)

(I am also aware that this example is sub-optimal in that the primary
keys are all single ints called 'id' and that they are present
individually in the query result -- imagine that I used an aggregate
function in a GROUP BY, for example, and you can see how the ids relied
upon would become opaque, or imagine a multicolumn pk and you
understand my odd notation.)

Is something like this even possible, much less doable at present? I
have an inkling that with all of the capabilities of information_schema,
rules, and triggers, it should be pretty close to doable.

Best,

Randall

--
Randall Lucas Tercent, Inc. DF93EAD1

In response to

Re: Dependency graph of all tuples relied upon in a query answer at 2006-08-31 13:17:54 from Merlin Moncure

Responses

Re: Dependency graph of all tuples relied upon in a query answer at 2006-09-01 13:30:57 from Merlin Moncure

Browse pgsql-general by date

	From	Date	Subject
Next Message	Alvaro Herrera	2006-09-01 03:15:11	Re: [GENERAL] Thought provoking piece on
Previous Message	Robert Treat	2006-09-01 00:55:11	Re: Status on Fedora Core Extras packaging