From: | Randall Lucas <rlucas(at)tercent(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Randall Lucas <rlucas(at)tercent(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Dependency graph of all tuples relied upon in a query answer |
Date: | 2006-09-01 01:54:46 |
Message-ID: | 20060901015446.GB17085@ontology.tercent.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote:
> On 8/30/06, Randall Lucas <rlucas(at)tercent(dot)com> wrote:
>
> >An inverted way of thinking about the problem is the notion of getting a
> >source document (say, a "company registration form") and parsing and
> >storing it in a normalized format. How do you "tag" the origin of each
> >and every row in every table that resulted from that source document?
>
> your form should have a code, either entered by the customer or by the
> preparer who enters it into the database, which becomes the key that
> identifies the registration document. Put that key into other tables.
Yes -- I have flirted with this idea for the case of importing a set of
facts from a single source. But where it breaks down is in being able
to do the reverse -- ex post facto, to certify the results of a query
as being accurate, and thereby attesting to the underlying facts.
> be careful, you are flirting with EAV thinking. I think EAV designs
> are terrible.
(Though religion is not at issue here, I am pretty devoutly relational.
I just want a good way to perform audits and other functions on my
relations.)
> > It seems to me that the elegant way to do this would be to get the
> >entire graph of dependencies for not only all tables that reference the
> >entity, but only those rows within those tables that refer specifically
> >to my entity.
> >
> >The query engine should have a pretty good idea about which tables and
> >which rows would actually be used in forming the responses to a given
> >query. How can I get this information? Or am I off the deep end (-ency
> >graph) with this one?
>
> I am not sure where you are going with this. Maybe you should mock up
> some simple tables and repost your question.
Imagine that I am the SEC. I look for cases of insider trading, and
when I find a case, I have to go back and double-check all the pieces
of evidence I am using against them.
create table company (
id serial primary key,
name text not null
);
create table officer (
id serial primary key,
company_id int not null references company(id) on delete cascade,
name text not null,
title text not null
);
create table insider_trade (
id serial primary key,
officer_id int not null references officer(id) on delete cascade,
shares_traded numeric,
share_price numeric,
trade_date timestamptz
);
insert into company (name) values ('goodco');
insert into company (name) values ('badco');
insert into officer (company_id, name, title) values (1, 'johnny b. good', 'ceo');
insert into officer (company_id, name, title) values (1, 'mother teresa', 'saint');
insert into officer (company_id, name, title) values (2, 'leroy brown', 'ceo');
insert into insider_trade (officer_id, shares_traded, share_price, trade_date) values
(3, '50000', '6.66', '2006-07-04 1:23 PM PST');
Now, my database looks like this:
select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+--------+----+------------+----------------+-------+----+------------+---------------+-------------+------------------------
1 | goodco | 1 | 1 | johnny b. good | ceo | | | | |
1 | goodco | 2 | 1 | mother teresa | saint | | | | |
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(3 rows)
If I want to know whom to investigate, I might do a query like this:
select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id where insider_trade.id is not null;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+-------+----+------------+-------------+-------+----+------------+---------------+-------------+------------------------
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(1 row)
Now that I have this query, in order to make my case, I need to "sign
off" on all of the individual data that went into it. I would like to
do something like:
select last_query_shown_tuples();
schema | table_name | pk_columns | pk_values
--------+---------------+------------+-----------
public | company | [id] | [2]
public | officer | [id] | [3]
public | insider_trade | [id] | [1]
(3 rows)
(I am aware that, at least theoretically, my query is just as reliant
upon the tuples that it *didn't* show, since they were acted upon by
the where clause.)
(I am also aware that this example is sub-optimal in that the primary
keys are all single ints called 'id' and that they are present
individually in the query result -- imagine that I used an aggregate
function in a GROUP BY, for example, and you can see how the ids relied
upon would become opaque, or imagine a multicolumn pk and you
understand my odd notation.)
Finally, I'd want to end up with something like:
select last_query_shown_tuples();
schema | table_name | pk_columns | pk_values | audited_ts
--------+---------------+------------+-----------+------------
public | company | [id] | [2] | 2006-08-31 18:52
public | officer | [id] | [3] | 2006-08-31 18:55
public | insider_trade | [id] | [1] | 2006-08-31 18:57
(3 rows)
Is something like this even possible, much less doable at present? I
have an inkling that with all of the capabilities of information_schema,
rules, and triggers, it should be pretty close to doable.
Best,
Randall
--
Randall Lucas Tercent, Inc. DF93EAD1
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2006-09-01 03:15:11 | Re: [GENERAL] Thought provoking piece on |
Previous Message | Robert Treat | 2006-09-01 00:55:11 | Re: Status on Fedora Core Extras packaging |