Re: On disable_cost

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: On disable_cost
Date: 2024-06-12 18:11:23
Message-ID: 20240612181123.oanjkuhb6qlautb3@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-06-12 11:35:48 -0400, Robert Haas wrote:
> Subject: [PATCH v2 3/4] Treat the # of disabled nodes in a path as a separate
> cost metric.
>
> Previously, when a path type was disabled by e.g. enable_seqscan=false,
> we either avoided generating that path type in the first place, or
> more commonly, we added a large constant, called disable_cost, to the
> estimated startup cost of that path. This latter approach can distort
> planning. For instance, an extremely expensive non-disabled path
> could seem to be worse than a disabled path, especially if the full
> cost of that path node need not be paid (e.g. due to a Limit).
> Or, as in the regression test whose expected output changes with this
> commit, the addition of disable_cost can make two paths that would
> normally be distinguishible cost seem to have fuzzily the same cost.
>
> To fix that, we now count the number of disabled path nodes and
> consider that a high-order component of both the cost. Hence, the
> path list is now sorted by disabled_nodes and then by total_cost,
> instead of just by the latter, and likewise for the partial path list.
> It is important that this number is a count and not simply a Boolean;
> else, as soon as we're unable to respect disabled path types in all
> portions of the path, we stop trying to avoid them where we can.

> if (criterion == STARTUP_COST)
> {
> if (path1->startup_cost < path2->startup_cost)
> @@ -118,6 +127,15 @@ compare_fractional_path_costs(Path *path1, Path *path2,
> Cost cost1,
> cost2;
>
> + /* Number of disabled nodes, if different, trumps all else. */
> + if (unlikely(path1->disabled_nodes != path2->disabled_nodes))
> + {
> + if (path1->disabled_nodes < path2->disabled_nodes)
> + return -1;
> + else
> + return +1;
> + }

I suspect it's going to be ok, because the branch is going to be very
predictable in normal workloads, but I still worry a bit about making
compare_path_costs_fuzzily() more expensive. For more join-heavy queries it
can really show up and there's plenty ORM generated join-heavy query
workloads.

If costs were 32 bit integers, I'd have suggested just stashing the disabled
counts in the upper 32 bits of a 64bit integer. But ...

<can't resist trying if I see overhead>

In an extreme case i can see a tiny bit of overhead, but not enough to be
worth worrying about. Mostly because we're so profligate in doing
bms_overlap() that cost comparisons don't end up mattering as much - I seem to
recall that being different in the not distant past though.

Aside: I'm somewhat confused by add_paths_to_joinrel()'s handling of
mergejoins_allowed. If mergejoins are disabled we end up reaching
match_unsorted_outer() in more cases than with mergejoins enabled. E.g. we
only set mergejoin_enabled for right joins inside select_mergejoin_clauses(),
but we don't call select_mergejoin_clauses() if !enable_mergejoin and jointype
!= FULL. I, what?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2024-06-12 18:14:33 Re: Improve the granularity of PQsocketPoll's timeout parameter?
Previous Message Robert Haas 2024-06-12 18:08:31 Re: race condition in pg_class