Re: Proper relational database?

From: <david(at)andl(dot)org>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Proper relational database?
Date: 2016-04-24 12:10:20
Message-ID: 008301d19e22$48109ce0$d831d6a0$@andl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> owner(at)postgresql(dot)org] On Behalf Of Thomas Munro

> So what incremental improvements could we steal from "properly relational"
> query languages?

Here is my list of deficiencies in some or all dialects of SQL.
-Relation and tuple as data types
-Globally exclude NULLs and duplicate columns
-Relation with no columns, null key
-Natural antijoin
-Tuple join operations
-Tuple tests for equality, superset, subset
-Tuple test for set membership of relation
-Relation tests for equality, superset, subset
-Relation set operations include symmetric difference
-User-defined functions of arbitrary complexity
-User-defined aggregation functions of arbitrary complexity
-User-defined ordered queries of arbitrary complexity
-Iterative/recursive queries (when)

> Here's one I've thought about, trivial as it may be. I noticed that Tutorial
> D (and apparently Andl too) includes a [NOT] MATCHING operator (alternatively
> spelled SEMIJOIN and SEMIMINUS) corresponding to the ⋉ (semi-join) and ▷
> (anti-join) operators from relational algebra. In SQL you write [NOT] EXISTS
> or [NOT] IN in the WHERE clause, rather than something explicit in a <joined
> table> clause, though experienced users often talk explicitly about semi- and
> anti-joins, both because of the theory and because the terms show up in query
> plans.

Yes, there are two joins (join and antijoin). Semijoin is just one of a number of projections following a join, but antijoin is a quite different algorithm.

Antijoin is quite hard to write in SQL in such a way that the query planner will do the right thing. There is a lot of variation between dialects.

> A recent blog post[1] argues that SQL should have a SQL92-style JOIN syntax
> for this and points at a couple of products that have added one[2][3]. I
> guess it might be hard to convince the Postgres community to add support for
> a non-standard syntax that doesn't give you anything you can't already do,
> but as an idea I find it interesting and it seems to be in the spirit of the
> part of the Third Manifesto that says: "support[ing] the usual operators of
> the relational algebra [..]. All such operators shall be expressible without
> excessive circumlocution."

The purpose is simply that explicit syntax allows for explicit query optimisation.
>
> For example, say we want all students who have one or more exam today:
>
> SELECT s.student_id, s.name
> FROM student s
> WHERE EXISTS (SELECT 1
> FROM exam e
> WHERE e.student_id = s.student_id
> AND e.exam_date = CURRENT_DATE)
>
> I don't know Tutorial D, but I think it might express that with something
> like:
>
> ( student MATCHING exam
> WHERE exam_date = CURRENT_DATE )
> { student_id, name }
>
> With 10 minutes of prototype hacking I convinced Postgres to accept SEMI and
> ANTI like this:
>
> SELECT s.student_id, s.name
> FROM student s SEMI JOIN exam e USING (student_id)
> WHERE e.exam_date = CURRENT_DATE
>
> I guess a real version should accept (or require?) LEFT or RIGHT before
> SEMIANTI.

Andl has semijoin, rsemijoin, ajoin and rajoin. They are trivial to add once you have the basic algorithm, but Tutorial D does not.

When using this hypothetical syntax I think you should be allowed
> to refer to e.exam_date in the WHERE clause but not in the SELECT list
> (Impala apparently does allow you to see data from exam, and returns values
> from an arbitrary matching row, but that doesn't seem right to me). But the
> EXISTS syntax is correspondingly strange in that it requires you to provide a
> SELECT list which is entirely discarded, so people often write "*" or "1"
> (erm, OK, I guess you can use an empty select list in recent Postgres).

SQL has an implicit ordering of query evaluation -- you will often need to write a nested subquery or correlated query for what should be very straightforward situations. That's another thing that's easy to fix, if allowed.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Geoff Winkless 2016-04-24 12:37:30 Re: Proper relational database?
Previous Message Geoff Winkless 2016-04-24 11:29:37 Re: Proper relational database?