Re: When you really want to force a certain join type?

From: Gunther Schadow <raj(at)gusw(dot)net>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: When you really want to force a certain join type?
Date: 2022-12-29 07:31:59
Message-ID: c0cf4fbd-a5db-e5dc-935f-399a355c02ee@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 12/28/2022 10:48 AM, Justin Pryzby wrote:
> Maybe the new parameter in v15 would help.
>
> https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-RECURSIVE-WORKTABLE-FACTOR
> recursive_worktable_factor (floating point)
>
> Sets the planner's estimate of the average size of the working table
> of a recursive query, as a multiple of the estimated size of the
> initial non-recursive term of the query. This helps the planner
> choose the most appropriate method for joining the working table to
> the query's other tables. The default value is 10.0. A smaller value
> such as 1.0 can be helpful when the recursion has low “fan-out” from
> one step to the next, as for example in shortest-path queries. Graph
> analytics queries may benefit from larger-than-default values.

Thanks that's something I will try after I upgraded.

But speaking of such other uses for recursive queries, I can say I have
quite a bit of experience of turning graph related "traversal" and
search and optimization and classification queries into SQL, in short,
computing the transitive closure. And usually I have stayed away from
the recursive WITH query and instead set up a start table and then
perform the iterative step. And there are two ways to go about it. Say
you have a graph, simple nodes and arcs. You want to find all paths
through the graph.

Now you can set up start nodes and then extend them at the end by
joining the recursive table to the simple arc table and extend your path
every time. This is what the WITH RECURSIVE supports. These queries
linearly iterate as many times as the length of the longest path.

with recursive arcs as (
  select source, target, 1 as distance, ...
    from ...
), paths as (
  select * from arcs
  union all
  select a.source, b.target, a.distance + 1 as distance, ...
   from paths a inner join_*arcs *_b
on(b.source = a.target)
)
select * from paths

But another way is to join paths with paths. It would be this, which I
think I have seen postgresql unable to deal with:

with recursive arcs as (
  select source, target, 1 as distance, ...
    from ...
), paths as (
  select * from arcs
  union all
  select a.source, b.target, a.distance + 1 as distance, ...
   from paths a inner join_*paths *_b
on(b.source = a.target)
)
select * from paths

So, instead of the recursive union to join back to the fixed table, it
joins the recursive table to the recursive table, and the benefit of
that is that these queries converge much quicker. Instead of going 10
iterations to find a path of length 10, you go 1 iteration to find all
paths of 2 (if distance 1 is the base table of all arcs), then next you
find paths of up to 4 then you find paths of up to 8, then 16, 32, ...
This converges much faster. I usually do that as follows

create table paths as
select source, target, 1 as distance, ...
from arcs;

prepare rep as
insert into paths(source, target, distance, ...)
select a.source, b.target, a.distance + b.distance as distance, ...
  from paths a inner join paths b on(b.source = a.target)
except
select * from paths;

execute rep;
execute rep;
...

or instead of the except, in order to minimize distances:

where not exists (select 1 from paths x
                   where x.source = a.source
and x.target = a.target
and x.distance < a.distance)

I have even done a group by in the recursive step which replaces the
paths relation at every iteration (e.g. with only minimal distance paths).

Since this converges so rapidly I often prefer that approach over a
recursive union query.

I think in IBM DB2 allowed to join the recursive table with itself. Is
this something you want to support at some time?

Also, why even use the RECURSIVE keyword, DB2 didn't need it, and the
query analyzer should immediately see the recursion, so no need to have
that keyword.

regards,
-Gunther

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2022-12-29 07:52:10 Re: When you really want to force a certain join type?
Previous Message Justin Pryzby 2022-12-28 15:48:37 Re: When you really want to force a certain join type?