Re: Remove restrictions in recursive query

From: Renan Alves Fonseca <renanfonseca(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove restrictions in recursive query
Date: 2025-03-28 18:42:55
Message-ID: CAN_p2Qj-9T7e2Ccrb5aEReJOdwr4uLq6YM1gxzH8RC1VCQL+1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 28, 2025 at 1:14 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Well, we extend the spec in lots of places. I'd be okay with removing
> this restriction if I were sure there were no bad consequences, but
> it seems likely that there are some. College math was way too long
> ago for me to be sure about the "fixed-point" business ... but I think
> what they may be on about is that rows produced by aggregation may not
> be stable in the face of adding more and more rows via additions to

After a quick math review, I'm quite convinced that the "fixed-point"
business applies only to the UNION [DISTINCT] case, in which the
relations can be seen as sets. Recursion with UNION ALL is a
completely different operation. I hope they mention that in the
standard. Consider this very basic recursive query:

WITH RECURSIVE t1 AS ( SELECT 1 UNION ALL SELECT * FROM t1) SELECT * FROM t1;

It does not converge. When using recursive UNION ALL it is the user
responsibility to create the conditions for convergence.

> the recursion workspace. In your example, I think it somewhat
> accidentally doesn't matter: an underlying row might get picked up
> and added to a dag.target-grouped row in one recursion step or
> a different one, but then you sum all those sums in the top-level
> query, so the top sum comes out the same regardless of just what sets
> the intermediate grouped rows consisted of. With a different query
> construction though, that would matter.

Not exactly accidental. I'm aware that the first GROUP BY runs in the
scope of the working table, and I'm aware that I can compose the sum
aggregates. Not every aggregate can be composed like that. The
following is the same query without the inner GROUP BY. As you've
mentioned, the SUM is much faster than COUNT (but still it has the
same complexity).

with recursive t1(node,nb_path) as
(select 1,1::numeric
union all
(select dag.target, 1
from t1 join dag on t1.node=dag.source)
) select sum(nb_path) from t1 join sink_nodes using (node) ;

Even if PostgreSQL allowed us to use GROUP BY in the recursive query,
we would be driving the intermediate steps, which feels wrong in a
declarative paradigm. Ideally, we would write the query above and
PostgreSQL would generate the most optimized execution that we've seen
before. I've seen that there is some ongoing work on PARTIAL
AGGREGATES. It would be beautiful if the optimizer could push down the
aggregate into the recursive clause. Does it sound possible?

Regards,
Renan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2025-03-28 18:59:12 Re: Remove restrictions in recursive query
Previous Message Alvaro Herrera 2025-03-28 18:42:22 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints