Re: On disable_cost

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: On disable_cost
Date: 2024-08-01 18:03:18
Message-ID: CA+TgmoZRwy8202vxbUPBeZd_Tx5NYVtmpvBnJnOzZS3b81cpkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2024 at 10:01 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I've reviewed both patches, here's what I noted down during my review:

Thanks.

> 0. I've not seen any mention so far about postgres_fdw's
> use_remote_estimate. Maybe changing the costs is fixing an issue that
> existed before. I'm just not 100% sure on that.
>
> patched:
> Foreign Scan on ft (cost=100.00..671.00 rows=2550 width=4)
>
> master:
> Foreign Scan on ft (cost=10000000100.00..10000000671.00 rows=2550 width=4)
>
> I kinda think that might be fixing an issue that I don't recall being
> reported before. I think we shouldn't really care that much about what
> nodes are disabled on the remote server and not having disabled_cost
> applied to that gives us that.

Hmm, I think it's subjective which behavior is better. If somebody
thought the new behavior was worse, they might want the remote side's
count of disabled nodes to be propagated to the local side, but I'm
disinclined to go there. My guess is that it doesn't matter much
either way what we do here, so I'd rather not add more code.

> 1. The final sentence of the function header comment needs to be
> updated in estimate_path_cost_size().

Fixed.

> 2. Does cost_tidscan() need to update the header comment to say
> tidquals must not be empty?

IMHO, no. The assertions I added to that function were intended as
documentation of what that function was already assuming about the
behavior of its caller. I had to trace through the logic in tidpath.c
for quite a while to understand why cost_tidscan() was not completely
broken. To spare the next person the trouble of working that out, I
added assertions. Now we could additionally add commentary in English
that restates what the assertions already say, but I feel like having
the assertions is good enough. If somebody ever whacks around
tidpath.c such that these assertions start failing, I think it will be
fairly clear to them that they either need to revert their changes in
tidpath.c or upgrade the logic in this function to cope.

> 3. final_cost_nestloop() seems to initially use the disabled_nodes
> from initial_cost_nestloop() but then it goes off and calculates it
> again itself. One of these seems redundant.

Oops. Fixed.

> The "We could include
> disable_cost in the preliminary estimate" comment explains why it was
> originally left to final_cost_nestloop(), so maybe worth sticking to
> that? I don't quite know the full implications, but it does not seem
> worth risking a behaviour change here.

I don't really see how there could be a behavior change here, unless
there's a bug. Dealing with the enable_* flags in initial_cost_XXX
rather than final_cost_XXX could be better or worse from a performance
standpoint and it could make for cleaner or less clean code, but the
user-facing behavior should be identical unless there are bugs.

The reason why I changed this is because of the logic in
add_path_precheck(): it exits early as soon as it sees a path whose
total cost is greater than the cost of the proposed new path. Since
the patch's aim is to treat disabled_node as a high-order component of
the cost, we need to make the same decision by comparing the count of
disabled_nodes first and then if that is equal, we need to compare the
total_cost. We can't do that if we don't have the count of
disabled_nodes for the proposed new path.

I think this may be a bit hard to understand, so let me give a
concrete example. Suppose we're planning some join where one side can
only be planned with a sequential scan and sequential scans are
disabled. We have ten paths in the path list and they have costs of
1e10+100, 1e10+200, ..., 1e10+1000. Now add_path_precheck() is asked
to consider a new path where there is a disabled node on BOTH sides of
the join -- the one side has the disabled sequential scan, but now the
other side also has something disabled, so the cost is let's say
2e10+79. add_path_precheck() can see at once that this path is a
loser: it can't possibly dominate any path that already exists,
because it costs more than any of them. But when you take disable_cost
out, things look quite different. Now you have a proposed path with a
total_cost of 79 and a path list with costs of 100, ..., 1000. If
you're not allowed to know anything about disabled_nodes, the new path
looks like it might be valuable. You might decide to construct it and
try inserting into the pathlist, which will end up being useless, and
even if you don't, you're going to compare its pathkeys and
parameterization to each of the 10 existing paths before giving up.
Bummer.

So, to avoid getting much stupider than it is currently,
add_path_precheck() needs a preliminary estimate of the number of
disabled nodes just like it needs a preliminary estimate of the total
cost. And to avoid regressions, that estimate needs to be pretty good.
A naive estimate would be to just add up the number of disabled_nodes
on the inner and outer paths, but that would be a regression in the
merge-join case, because initial_cost_mergejoin() calls cost_sort()
for the inner and outer sides and that will add disable_cost if sorts
are disabled. If you didn't take the effect of cost_sort() into
account, you might think that your number of disabled_nodes was going
to be substantially lower than it really would be, leading to wasted
work as described in the last paragraph. Plus, since
initial_cost_mergejoin() is incurring the overhead of calling
cost_sort() anyway to get the total cost numbers anyway, it would be
silly not to save the count of disabled nodes: if we did, we'd have to
redo the cost_sort() call in final_cost_mergejoin(), which would be
expensive.

If we wanted to make our estimate of the # of disabled nodes exactly
comparable to what we now do with disable_cost, we would postpone if
(!enable_WHATEVERjoin) ++disabled_nodes to the final_cost_XXX
functions and do all of the other accounting related to disabled nodes
at the initial_cost_XXX phase. But I do not like that approach.
Postponing one trivial portion of the disabled_nodes calculation to a
later time won't save any significant number of CPU cycles, but it
might confuse people reading the code. You then have to know that the
disabled_nodes count that gets passed to final_cost_XXX is not yet the
final count, but that you may still need to add 1 for the join itself
(but not for the implicit sorts that the join requires, which have
already been accounted for). That's the kind of odd definition that
breeds bugs. Besides, it's not as if moving that tiny bit of logic to
the initial_cost_XXX functions has no upside: it could allow
add_path_precheck() to exit earlier, thus saving cycles.

(For the record, the explanation above took about 3 hours to write, so
I hope it's managed to be both correct and convincing. This stuff is
really complicated.)

> 4. I wonder if it's worth doing a quick refactor of the code in
> initial_cost_mergejoin() to get rid of the duplicate code in the "if
> (outersortkeys)" and "if (innersortkeys)" branches. It seems ok to do
> outer_path = &sort_path. Likewise for inner_path.

I don't think that's better.

> 5. final_cost_hashjoin() does the same thing as #3

Argh. Fixed.

> 6. createplan.c adds #include "nodes/print.h" but doesn't seem to add
> any code that might use anything in there.

Fixed.

> 8. There's something weird with CTEs too.
>
> I'd expect the final sort to have disabled_nodes == 2 since
> disabled_cost has been added twice in master.

Right now, disabled node counts don't propagate through SubPlans (see
SS_process_ctes). Maybe that needs to be changed, but aside from
looking weird, does it do any harm?

> 7. create_lockrows_path() needs to propagate disabled_nodes.
> 9. create_set_projection_path() needs to propagate disabled_nodes too:
> 10. create_setop_path() needs to propagate disabled_nodes.
> 11. create_modifytable_path() needs to propagate disabled_nodes.

I changed all of these, but I think these examples only establish that
those nodes DO NOT propagate disabled_nodes, not that they need to. If
we're past the point of making any choices based on costs, then
maintaining disabled_nodes or not doing so won't affect correctness.
That's not to say these aren't good to tidy up, and some of them may
well be bugs, but I don't think your test cases prove that. What
primarily matters is whether the enable_BLAH GUCs get respected; the
exact contents of the EXPLAIN output are somewhat more arguable.

> 12. For the 0002 patch, I do agree that having this visible in EXPLAIN
> is a must. I'd much rather see: Disabled: true/false. And just
> display this when the disabled_nodes is greater than the sum of the
> subpaths. That might be much more complex to implement, but it's
> going to make it much easier to track down the disabled nodes in very
> large plans.

I think it's going to be very unpleasant if we have the planner add
things up and then try to have EXPLAIN subtract them back out again.
One problem with that is that all of the test cases where you just
showed disabled_nodes not propagating upward wouldn't actually show
anything any more, because disabled_nodes would not have been greater
in the parent than in the child. So those are oversights in the code
that are easy to spot now but would become hard to spot with this
implementation. Another problem is that the EXPLAIN code itself could
contain bugs, or slightly more broadly, get out of sync with the logic
that decides what to add up. It won't be obvious what's happening:
some node that is actually disabled just won't appear to be, or the
other way around, and it will be hard to understand what happened,
because you won't be able to see the raw counts of disabled nodes that
would allow you to deduce where the error actually is.

One idea that occurs to me is to store TWO counts in each path node
and each plan node: the count of self-exclusive disabled nodes, and
the count of self-include disabled nodes. Then explain can just test
if they are different. If the answer is 1, the node is disabled; if 0,
it's enabled; if anything else, there's a bug (and it could print the
delta, or each value separately, to help localize such bugs). The
problem with that is that it eats up more space in
performance-critical data structures, but perhaps that's OK: I don't
know.

Another thought is that right now you just see the disable_cost values
added up with the rest of the cost. So maybe propagating upward is not
really such a bad behavior; it's what we have now.

This point probably needs more thought and discussion, but I'm out of
time to work on this for today, and out of mental energy too. So for
now here's v5 as I have it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v5-0001-Treat-number-of-disabled-nodes-in-a-path-as-a-sep.patch application/octet-stream 74.1 KB
v5-0002-Show-number-of-disabled-nodes-in-EXPLAIN-ANALYZE-.patch application/octet-stream 14.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2024-08-01 18:09:15 Re: rare crash - FailedAssertion snapbuild.c Line: 580
Previous Message Nathan Bossart 2024-08-01 17:44:35 Re: optimizing pg_upgrade's once-in-each-database steps