Re: On disable_cost

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: On disable_cost
Date: 2024-08-02 03:34:16
Message-ID: CAApHDvo8EFm5A5f2S5hKEqSETTymnbSbFe=P-EJ=-8sAGKfWnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2 Aug 2024 at 06:03, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think this may be a bit hard to understand, so let me give a
> concrete example. Suppose we're planning some join where one side can
> only be planned with a sequential scan and sequential scans are
> disabled. We have ten paths in the path list and they have costs of
> 1e10+100, 1e10+200, ..., 1e10+1000. Now add_path_precheck() is asked
> to consider a new path where there is a disabled node on BOTH sides of
> the join -- the one side has the disabled sequential scan, but now the
> other side also has something disabled, so the cost is let's say
> 2e10+79. add_path_precheck() can see at once that this path is a
> loser: it can't possibly dominate any path that already exists,
> because it costs more than any of them. But when you take disable_cost
> out, things look quite different. Now you have a proposed path with a
> total_cost of 79 and a path list with costs of 100, ..., 1000. If
> you're not allowed to know anything about disabled_nodes, the new path
> looks like it might be valuable. You might decide to construct it and
> try inserting into the pathlist, which will end up being useless, and
> even if you don't, you're going to compare its pathkeys and
> parameterization to each of the 10 existing paths before giving up.
> Bummer.

OK, so it sounds like you'd like to optimise this code so that the
planner does a little less work when node types are disabled. The
existing comment does mention explicitly that we don't want to do
that:

/*
* We could include disable_cost in the preliminary estimate, but that
* would amount to optimizing for the case where the join method is
* disabled, which doesn't seem like the way to bet.
*/

As far as I understand it from reading the comments in that file, I
see no offer of guarantees that the initial cost will be cheaper than
the final cost. So what you're proposing could end up rejecting paths
based on initial cost where the final cost might end up being the
cheapest path. Imagine you're considering a Nested Loop and a Hash
Join, both of which are disabled. Merge Join is unavailable as the
join column types are not sortable. If the hash join costs 99 and the
initial nested loop costs 110, but the final nested loop ends up
costing 90, then the nested loop could be rejected before we even get
to perform the final cost for it. The current code will run
final_cost_nestloop() and find that 90 is cheaper than 99, whereas
what you want to do is stop bothering with nested loop when we see the
initial cost come out at 110.

Perhaps it's actually fine if the initial costs are always less than
the final costs as, if that's the case, we won't ever reject any paths
based on the initial cost that we wouldn't anyway based on the final
cost. However, since there does not seem to be any comments mentioning
this guarantee and if you're just doing this to squeeze more
performance out of the planner, it seems risky to do for that reason
alone.

I'd say if you want to do this, you should be justifying it on its own
merit with some performance numbers and some evidence that we don't
produce inferior plans as a result. But per what I quoted above,
you're not doing that, you're doing this as a performance
optimisation.

I'm not planning on pushing this any further. I've just tried to
highlight that there's the possibility of a behavioural change. You're
claiming there isn't one. I claim there is.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-08-02 04:00:01 Re: rare crash - FailedAssertion snapbuild.c Line: 580
Previous Message Kyotaro Horiguchi 2024-08-02 02:57:17 wrong translation file reference in pg_createsubscriber