Re: Self contradictory examining on rel's baserestrictinfo

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ro b <bigbro_wq(at)hotmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Self contradictory examining on rel's baserestrictinfo
Date: 2025-01-23 21:59:53
Message-ID: 233451.1737669593@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Nov 27, 2024 at 10:28 AM ro b <bigbro_wq(at)hotmail(dot)com> wrote:
>> I mean why can't draw the conclusions
>> like the case a>1 and a>2 which can be simplified
>> to a>2 if the operator > is btree opclass.

> This question has already been discussed in some detail on this email
> thread. First, we sometimes do detect it, as discussed by Peter.
> Second, it is an unusual case and so not worth spending a lot of CPU
> cycles to detect, as discussed by Tom.

> We might still accept a patch to improve things in this area. For
> example, if somebody could find a way to change the code so that we
> are able to detect more cases without needing to spend more CPU
> cycles, I think everyone would like that. However, that may be a
> tricky goal to accomplish.

This discussion trailed off without any final resolution about what
to do with the patch submission. I think we owe the submitter at
least a minimal actual review, so ...

I took a quick look at the "self_contradictory.sql" regression test
included in the patch, with an eye to seeing how much of it we pass
already when constraint_exclusion is set to "on". The answer is
"a fair amount", in particular we successfully detect many of the
cases where the clauses can be proved contradictory. The ones
where we fail to do so are mostly cases like

explain (costs off)
SELECT * FROM self_a WHERE z IS UNKNOWN AND z IS NOT UNKNOWN;
- QUERY PLAN
---------------------------
- Result
- One-Time Filter: false
+ QUERY PLAN
+---------------------------------------------------
+ Seq Scan on self_a
+ Filter: ((z IS UNKNOWN) AND (z IS NOT UNKNOWN))
(2 rows)

This is not a fundamental shortcoming in the system, it's just
that optimizer/util/predtest.c has not been built out very far
with respect to BooleanTest nodes. We could certainly do that,
and it wouldn't add much runtime for queries that contain no
BooleanTest conditions, but I have to question whether it's
worth the trouble. Is a query like the above really common?

There are also a bunch of cases that are not contradictory
but that the patch manages to simplify, for example

explain (costs off)
SELECT * FROM self_a WHERE a <= 10 AND a = 10;
- QUERY PLAN
---------------------
+ QUERY PLAN
+------------------------------------
Seq Scan on self_a
- Filter: (a = 10)
+ Filter: ((a <= 10) AND (a = 10))
(2 rows)

The constraint_exclusion framework can't make this sort of
transformation because it is only focused on proving a relation's
quals to be contradictory or not. So this part is new work ...
but again I have to wonder if this query pattern is common enough
to be worth spending planner cycles to check for.

As Peter mentioned, we do actually have logic that performs the
equivalent transformation within btree indexqual setup. However, the
context is pretty different: first, the work of identifying clauses
that bear on the same variable has already been done, and second we
really have to do at least some of this work on the way to identifying
the appropriate initial search condition for btree descent. It's not
apparent to me that it's worth having a whole new planner pass that
is searching for clauses involving the same variable. That is not
especially cheap if there are a lot of clauses. If we were to pursue
this, I'd want to think about somehow integrating it with the work
that indxpath.c already does to collect index-relevant conditions.

So in short, my main review comment is that you should be looking
to integrate with existing planner functionality such as predtest.c
and indxpath.c, rather than writing a few thousand lines of
not-connected-to-anything-else code. It's too hard to make the
case that these optimizations are worth the cost if they require
an entire separate analysis pass.

Some other comments:

* You need to work harder on the comments in whatever you write.
contradictory.c is drastically undercommented, and what there is
is frequently unhelpful, like the obviously copied-and-pasted
file header. You also did things like adding entirely-undocumented
fields to common structures such as RestrictInfo. This certainly
wouldn't be committable as-is; but it also makes the patch painful
to review in any detail, since the reviewer must reverse-engineer
a lot of information that should have been provided by comments.

* The side-effects on existing regression test cases are pretty
concerning: it's not clear that those tests still exercise what
they were meant to. (You can be sure that a test case intentionally
written with redundant conditions was not written that way just to
provide a case you could optimize later.) We'd have to analyze
all those test cases in detail and figure out whether the
simplification is OK or we need to provide some way to defeat it.

I'm going to mark this submission Returned With Feedback, because
I don't foresee anything very close to this getting committed.
But I'd encourage you to pursue the ideas in other forms as
suggested by this discussion.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-01-23 22:34:17 Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message Daniel Verite 2025-01-23 21:52:16 Re: Inconsistent string comparison using modified ICU collations