| From: | Roberto Cornacchia <cornacch(at)CS(dot)UniBO(dot)IT> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Cc: | Paolo Ciaccia <ciaccia(at)CS(dot)UniBO(dot)IT>, Andrea Ghidini <ghidini(at)CS(dot)UniBO(dot)IT> | 
| Subject: | Generalized Top Queries on PostgreSQL | 
| Date: | 2000-02-19 01:23:38 | 
| Message-ID: | Pine.GSO.3.96.1000219022048.17025B-100000@adina.cs.unibo.it | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello everyone,
Dr. P. Ciaccia, A. Ghidini and I (R. Cornacchia), have recently
developed, and implemented on PostgreSQL, a class of TOP Queries. 
Here I would like to briefly present our main results.
The proposed sintax is the following:
SELECT
FROM
WHERE
STOP AFTER <N>
  FOR EACH <stop-grouping attribute list>
  RANK BY <ranking specification>
ORDER BY 
where both FOR EACH and RANK BY (and STOP AFTER clause as a whole) are
optional.
Here is an example:
----
"Retrieve the 10 highest paid employees for each department.
 Order the results first on department name and then on employee name"
SELECT Dept.name, Emp.name, Emp.salary
FROM Emp, Dept
WHERE Emp.dno = Dept.dno
STOP AFTER 10
  FOR EACH Dept.dno
  RANK BY Emp.salary DESC
ORDER BY Dept.name, Emp.name
----
We called such a query "Generalized Top Query".
The semantics introduced by the example is derived from the one
proposed by Carey and Kossmann ("On Saying 'Enougth Already' in
SQL", 1997). The main points of our extension are: 
1) You can obtain the <N> Top rows as dictated by the RANK BY
specification and then produce the results according to the 
ORDER BY specification. This means much more flexibility.
2) You can obtain the Top N rows for each of the "groups" which are 
individuated by the FOR EACH specification 
(from here "Generalized" top query).
Please consider that our semantics completely includes the current 
LIMIT clause capabilities, offering at the same time some additional 
ones.
For what concerns PostgreSQL, one of our first issues were to keep
basically unchanged
the current framework. 
Here is a brief list of major changes we operated:
- We provided 2 new physical operators:
  ScanStop: stops the stream to <N> rows          
  SortStop: performs an ad-hoc sort on the stream, 
            retaining only the <N> top rows. 
- The optimizer can operate a push-down in the path-tree of 
  those two operators. It means we reduce the 
  stream cardinality as soon as possible, leading to a great
  improvement on the performances of the subsequent operators.
- A larger number of optimizable operators force the optimizer to
  generate more	plans to handle,
  so we extended the operators properties and the pruning rules.
- We extended the cost model as well, introducing estimates for the
  cost of producing the FIRST N tuples.
- The rules involved in the Stop operators placement are mainly
  based upon referential integrity constraints. In this way we 
  provided a *temporary* solution to this Postgres lack, storing
  the informations on constraints in two new system catalogs.
- The FOR EACH generalization leads to natural generalization
  of all the above concepts.
- The evaluation of GROUP BY  clause can be performed before and after
  the STOP AFTER, and both makes sense. In order to optimize the Stop
  After operator, we choose to evaluate it before the group by 
  (and the order by).
* Let us summarize the current state of our work. *
- Our extension is highly optimized and can lead to performance
  improvements of several orders of magnitude in comparison with the
  current "LIMIT approach"
- It has a low impact on the original PostgreSQL code
- It does not affect the usual processing of classical queries
- It is soon expected to work with views. 
- It works with subqueries
- It works with cursors
- It efficently makes use of indices
- It is updated to a 6.6 snapshot of November 1999 (we are waiting 
  for a more stable release)
--------------------------------------------------------------------
More details on this subject are going to be presented in a
forthcoming paper, available if interested. 
We would be glad to receive your comments on this work.
Best regards,
R. Cornacchia (cornacch(at)cs(dot)unibo(dot)it) Computer Science, University of
Bologna
A. Ghidini (ghidini(at)cs(dot)unibo(dot)it) Computer Science, University of Bologna
Dr. Paolo Ciaccia (ciaccia(at)cs(dot)unibo(dot)it) DEIS CSITE-CNR, University of
Bologna 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Marc Tardif | 2000-02-19 01:26:11 | handling multiple file descriptors | 
| Previous Message | Bruce Momjian | 2000-02-19 00:38:35 | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level ? |