Re: On disable_cost

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: On disable_cost
Date: 2024-10-07 15:09:10
Message-ID: CAKFQuwY9Emt4P2jU6KWiK=xaMp_JQvyS4PJG6BTUHWhwkD3+KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 4, 2024 at 10:37 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> I'd encourage anyone else on the sidelines who has an opinion on how
> to display the disabled-ness of a plan node in EXPLAIN to speak up
> now, even if it's just a +1 to something someone has already written.
> It would be nice to see what more people think.
>
>
As a DBA when I set one or more of the enable_* settings to false, and
explain a query, I need to know:

1, that the plan shown to me is constrained,
2, which constraints are in place, and
3, which constraints were violated.

The Settings option to Explain fulfills my second need. It is not a
precise solution nor is it automatic. Because of these two things it
really doesn't qualify as fulfilling the first need.

To fulfill the first need I would want to see a data block containing the
following information:
How many (>= 1) enable_* settings are set to false. This is the bare
requirement, but we can also include a count of how many violations exist,
thus aggregating the count of the third need. This information is not
specific to any node and thus should be displayed outside of the execution
tree, the specific choice consistent with the output format under
consideration.

The detail for the third need, violations, is tied to specific executor
nodes. The information provided here should inform me as to which specific
setting was violated as well as, if possible, why. This is effectively
three pieces of information: "Disabled: * (footnote)" The word disabled
is the indicator that this node type was requested to not be included in
the query plan. The * tells me exactly which of the disabled settings is
at play here, reducing the cognitive burden of memorizing node types to
settings. The footnote would be a reference into the documentation under
the enable_* setting that explains why this node is appearing in the query
plan even though I explicitly asked for it to be excluded. In a verbose
output (add a new violations option for this) it would even be possible to
save the trip to the documentation by adding the footnote text to the
explain output.

Now, existing proposals include another piece of data - for every node
calculate how many violations occur in its tree (inclusive). I'm not
understanding the motivation for this data. Knowing which nodes are
violations seems like it is enough. I could always count, and processing
tools could add this aggregate to their displays, but the count itself only
seems useful at the scope of the entire query plan. And possibly sub-plans.

So, by way of example:

set enable_sort=0;
explain (costs off, settings, violations) select * from lp order by a;

Append
-> Index Only Scan using lp1_a_idx on lp1 lp_1
-> Sort
Disabled: Sort (A)
Sort Key: lp_2.a
-> Seq Scan on lp2 lp_2

Disabled Planner Settings: 1
Disabled Node Violations: 1
Settings:
...
enable_sort = false
....
Violation Reasons:
Sort (A): The query contains an order by clause over data coming from a
table being sequentially scanned. This scan's output must be sorted to
fulfill the order by requirement.

I was considering doing a layout like:

Sort (disabled_sort.A) (cost...) (actual...)

but having its own line on those nodes having the violation seems
reasonable. It should be noticeable when the violations occur and this
does stand out. The less pronounced format would be more appropriate for
the "Disabled: #" display that would appear on every single node; which I
believe is counter-productive. Only marking the violation provides the
same amount of detail and allows for the computation of those counts should
the need arise. As a DBA, though, I do not know how to use that count in a
meaningful way.

In text format we place additional information at the bottom of the query
result. It is worth considering whether to place information before the
planner tree. If that is acceptable the two "Disabled Planner %:" counts
should be moved to before the node tree. This draws immediate attention to
the explain output consumer that this plan is constrained and that other
options, like settings and violations, should be added to the explain
command to show additional details. But the two counts and the node detail
"Disabled: * (footnote)" will always be visible.

The footnote definitely is its own feature added to increase usability.
I'm expecting it to not be accepted given the current design of explain,
and also it seems quite difficult to get good data out of the planner to
make the display accurate. But if we tell someone that a setting they
disable is violated they are going to wonder why.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-10-07 15:11:28 Re: POC, WIP: OR-clause support for indexes
Previous Message Robert Haas 2024-10-07 15:08:49 Re: On disable_cost