Quick Links

Re: Questions on specifying table relationships

From:	Patrick Bakker <patrick(at)vanbelle(dot)com>
To:	'Richard Huxton' <dev(at)archonet(dot)com>, "'PostgreSQL General (E-mail)'" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Questions on specifying table relationships
Date:	2002-11-01 19:28:27
Message-ID:	A9CE1D556F89DD4FBA4CF797215DF61A02F366@20svbl1.vanbelle.local
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Comments are in-line.

> On Friday 01 Nov 2002 5:34 pm, Patrick Bakker wrote:
> > Thanks for the reply Richard.
> > I was thinking about your query templates suggestion and I
> > don't think that is a sufficient solution for what I'm hoping
> > to do. I have two purposes in mind for the query generator.
> >
> > My application allows the user to choose a starting point
> > (ie. item, order, customer, location) and then shows a list - on the
> > left-hand side of the screen, at the full height of the screen -
> > presenting identifying elements (ie. name, order #, etc.) specific to
> > whichever starting point the user has chosen. In this way, as the user
> > changes selections on this list the content on the right automatically
> > updates to reflect the new selection.
> >
> > Now the user can change any of the options on the query for each of
these
> > selection lists, in any combination, to create arbitrary selection lists
> > which are more focused for their purposes. Presenting each possible
query
> > is likely to get very involved and I'd have to repeat it for every
starting
> > point.
> >
> > Additionally, I'm planning on using the same query system to describe
> > reports in external files. The entire report system will consist of
> > externally defined reports like this. So in effect the query system I'm
> > looking for here is the underlying organization which would make your
query
> > template system possible and completely dynamic. ie. I can add a new
report
> > to the system by dropping in a report definition file in the server's
> > shared network drive where the application reads its configuration from.
>
> OK - so how would I create something like:
>
> SELECT o.ord_date, p.prod_type, sum(o.sale_price) AS tot_sales
> FROM orders o, products p
> WHERE o.prod_id=p.prod_id
> GROUP BY o.ord_date, p.prod_type;

The definition of a static query template for this could be:

<displayable-fields>
<displayable-field default="yes">
<source>Order.orderDate</source>
</displayable-field>
<displayable-field default="yes">
<source>Order.Product.type</source>
</displayable-field>
<displayable-field default="yes">
<source>SUM(Order.salePrice)</source>
</displayable-field>
</displayable-fields>
<groupable-fields>
<groupable-field default="yes">
<source>Order.orderDate</source>
</groupable-field>
<groupable-field default="yes">
<source>Order.Product.type</source>
</groupable-field>
</groupable-fields>

-- The query generator parses the source fields and converts from an
EJB-derived notation to the actual
database tables and columns. In addition it recognizes expressions. The
part of the generator that forms
the actual SQL queries is database specific and converts from a standard
list of functions, operators,
etc. to the database specific ones (not a problem in this case).

> Does your LH column list *all* the possible fields resulting
> from all possible joins on the order table (products, customers,
> departments...)

No. It only resolves as far as is necessary. ie. if the user is looking at
orders and has only used order fields in defining their selection list then
only the order table will be involved in the query. If the user now tries to
limit the displayable orders by a field in the customer's table (ie.
salesman), then the query will now retrieve the unique orders which match
this query. In the case of the selection lists the query template definition
has a very limited number of displayable fields but has a great variety in
the how it can be limited. (see end for more explanation).

> > Can you elaborate on your statement "without knowing what
> > it means to connect two tables via two columns I'm not sure the
> > automated system could decide between options."? Are you referring to
> > the autoquery generator being unable to guess the join because it
depends on the
> > meaning of the data in the tables or are you saying that the type of
join
> > will vary for each query and therefore cannot be known ahead of time?
>
> I'm saying that even if you specify all possible joins
> between 2 tables, you might have to choose between two ways connecting 3
tables.
> The only way to do that is if you know what the columns *mean* and what
your
> query is trying to achieve.

Ok. Each query template would hopefully provide the context necessary to
decide which
join is intended.

> > I have a few other options for providing more information
> > to the auto-query generator. Since I'm parsing all of the fields
> > and relationships from EJB descriptors (jbosscmp-jdbc.xml,
> > jaws.xml and ejb-jar.xml) I have the following information available for
> > describing relationships:
> >
> [sample information]
> >
> > If I do it this way and require every query to begin with
> > the same EJB bean then each query would fully describe the relationship
> > path needed for each field.
>
> Well, you have enough here to specify relationships, but I
> still think it will need a sentient being to decide on the details. I
could be
> wrong though.

Hmm. I have the feeling you could be right, but I can't think of a case
where this would be true.

> Are you familiar with MS-Access (or similar consumer DBs) and
> it's grid way of building queries? That's about as general purpose and
> graphical as I've seen. You identify the tables you want, and then the
columns within.
> It automatically indicates joins that it has been told about, or
> you can make your own. You still need to know something about the
structure of the
> database though.

Yes. However, I'm looking for more of a middle-ground. Access has two
stages - Query Design and Results. The ability to specify parameters is sort
of a middle-ground where you can adjust the query and get different results.
I'm looking to make a query template where the user can make more extensive
changes - ie. design, customize, report. The structure of the query template
will have to be sufficiently precise so as to avoid ambiguity in joins but
otherwise be flexible for a user's query.

So each query template defines the following (broken down by SQL syntax):
SELECT
- Which fields the user can choose to include/exclude.
- Which fields are mandatory and the user cannot remove from the query.
- Further, mandatory fields may also be hidden from the user.
FROM
- Fully determined by the query automater based on all of the fields used
in SELECT, WHERE, GROUP BY, HAVING
and ORDER BY statements.
WHERE and HAVING
- There is a list of limitable fields which the user can use to apply
limits to the query if they wish.
- Again some may be fixed and or invisible to the user.
- Choice between a WHERE and HAVING is determined while parsing the source
of the field. If its an aggregate
function like SUM the limit will be put in the HAVING clause of the query.
- Each field listed in the limitable list of the query also has an
associated data type with it and there are
a limited number of these (string, date, money, number, etc...).
- The user sees limits based on the type of the field. Here are some
examples:

[ Customer Name ] [ begins with ] [ ]
[ Order Date ] [ between dates ] [ 01/01/2002 ] and [ 01/31/2002 ]

GROUP BY
- Again there is a list of fields which the user can use to specify the
grouping of the query.

ORDER BY
- A list of fields which the user can use to order the arrangement of the
query.

I'm now convinced it should work fairly well, so I am going to begin coding
this and try to flush out the remaining snags as I run into them.

Patrick

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2002-11-01 19:47:44	Re: FIFO Queue Problems
Previous Message	Booth, Robert	2002-11-01 19:24:14	Update FROM clause?