Re: Proper relational database?

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: davidb(at)pfxcorp(dot)com
Cc: Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Proper relational database?
Date: 2016-04-24 07:36:37
Message-ID: CAEepm=1aqRjadSkoz7BUyy_FoVpGZ_VeQr0PTv8YRDXaOCEoAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Apr 24, 2016 at 2:56 PM, <david(at)andl(dot)org> wrote:
>> One of the people involved in that was Hugh Darwen, who is one of the authors
>> of The Third Manifesto, which is an attempt to define what a properly
>> relational language and system should look like. So you could say the
>> experience of ISBL vs SQL has been folded into that effort.
>
> See http://www.thethirdmanifesto.com/.

So what incremental improvements could we steal from "properly
relational" query languages?

Here's one I've thought about, trivial as it may be. I noticed that
Tutorial D (and apparently Andl too) includes a [NOT] MATCHING
operator (alternatively spelled SEMIJOIN and SEMIMINUS) corresponding
to the ⋉ (semi-join) and ▷ (anti-join) operators from relational
algebra. In SQL you write [NOT] EXISTS or [NOT] IN in the WHERE
clause, rather than something explicit in a <joined table> clause,
though experienced users often talk explicitly about semi- and
anti-joins, both because of the theory and because the terms show up
in query plans.

A recent blog post[1] argues that SQL should have a SQL92-style JOIN
syntax for this and points at a couple of products that have added
one[2][3]. I guess it might be hard to convince the Postgres
community to add support for a non-standard syntax that doesn't give
you anything you can't already do, but as an idea I find it
interesting and it seems to be in the spirit of the part of the Third
Manifesto that says: "support[ing] the usual operators of the
relational algebra [..]. All such operators shall be expressible
without excessive circumlocution."

For example, say we want all students who have one or more exam today:

SELECT s.student_id, s.name
FROM student s
WHERE EXISTS (SELECT 1
FROM exam e
WHERE e.student_id = s.student_id
AND e.exam_date = CURRENT_DATE)

I don't know Tutorial D, but I think it might express that with something like:

( student MATCHING exam
WHERE exam_date = CURRENT_DATE )
{ student_id, name }

With 10 minutes of prototype hacking I convinced Postgres to accept
SEMI and ANTI like this:

SELECT s.student_id, s.name
FROM student s SEMI JOIN exam e USING (student_id)
WHERE e.exam_date = CURRENT_DATE

I guess a real version should accept (or require?) LEFT or RIGHT
before SEMI/ANTI. When using this hypothetical syntax I think you
should be allowed to refer to e.exam_date in the WHERE clause but not
in the SELECT list (Impala apparently does allow you to see data from
exam, and returns values from an arbitrary matching row, but that
doesn't seem right to me). But the EXISTS syntax is correspondingly
strange in that it requires you to provide a SELECT list which is
entirely discarded, so people often write "*" or "1" (erm, OK, I guess
you can use an empty select list in recent Postgres).

[1] https://blog.jooq.org/2015/10/13/semi-join-and-anti-join-should-have-its-own-syntax-in-sql/
[2] http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_joins.html
[3] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tomas J Stehlik 2016-04-24 10:54:09 Re: Is it possible to recover the schema from the raw files?
Previous Message david 2016-04-24 05:49:08 Re: On the right tool (was Re: Proper relational database?)