Re: Teach predtest about IS [NOT] <boolean> proofs

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teach predtest about IS [NOT] <boolean> proofs
Date: 2024-01-25 01:08:47
Message-ID: CAAaqYe8CRCdMceeWkbGbrjQEyiP1S21hZYCKjuu=YFygpfJJdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the feedback.

On Mon, Jan 22, 2024 at 12:57 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> James Coleman <jtc331(at)gmail(dot)com> writes:
> > 0001 does the initial pure refactor. 0003 makes a lot of modifications
> > to what we can prove about implication and refutation. Finally, 0003
> > isn't intended to be committed, but attempts to validate more
> > holistically that none of the changes creates any invalid proofs
> > beyond the mostly happy-path tests added in 0004.
>
> > I ended up not tackling changing how test_predtest tests run for now.
> > That's plausibly still useful, and I'd be happy to add that if you
> > generally agree with the direction of the patch and with that
> > abstraction being useful.
>
> > I added some additional verifications to the test_predtest module to
> > prevent additional obvious flaws.
>
> I looked through 0001 and made some additional cosmetic changes,
> mostly to get comments closer to the associated code; I also
> ran pgindent on it (see v5-0001 attached). That seems pretty
> committable to me at this point.

Great.

> I also like your 0002 additions to
> test_predtest.c (although why the mixture of ERROR and WARNING?
> ISTM they should all be WARNING, so we can press on with the test).

My reasoning is that one is a major error in something larger than
predtest, while the other is clearly "your code change isn't
accurate". The surrounding code seems to be drawing a distinction also
(it uses both ERROR and WARNING), and so I was trying to parallel that
appropriately.

I'm fine with making both WARNING though.

But does that also mean we should make other such cases WARNING as
well? For example, the query not returning two boolean columns doesn't
really seem like a reason to break subsequent tests.

I haven't changed this yet pending these questions.

> One other thought is that maybe separating out
> predicate_implied_not_null_by_clause should be part of 0001?

Would you prefer to commit a refactor along with some functionality
changes? Or one patch with the pure refactor and then a second patch
with the predicate_implied_not_null_by_clause changes?

> I'm less excited about the rest of v4-0002.
>
> @@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate,
> !weak))
> return true;
>
> + /*
> + * Because weak refutation expands the allowed outcomes for B
> + * from "false" to "false or null", we can additionally prove
> + * weak refutation in the case that strong refutation is proven.
> + */
> + if (weak && not_arg &&
> + predicate_implied_by_recurse(predicate, not_arg,
> + true))
> + return true;
> +
> switch (pclass)
> {
> case CLASS_AND:
>
> I don't buy this bit at all. If the prior recursive call fails to
> prove weak refutation in a case where strong refutation holds, how is
> that not a bug lower down?

This is one of the last additions I made while authoring the most
recent version of the patch, and at first I thought it suggested a bug
lower down also.

However the cases proven by these lines ("x is not false" is weakly
refuted by "not x", "x is false", and "x = false") correctly do not
have their not arg ("x") strongly implied by "x is not false" since
both "x is null" and "x is true" would have to imply "x", which
obviously doesn't hold. These aren't cases we're handling directly in
predicate_refuted_by_simple_clause.

This is caused by the asymmetry between implication and refutation
that I noted in my addition to the comments nearer the top of the
file:

+ * A notable difference between implication and refutation proofs is that
+ * strong/weak refutations don't vary the input of A (both must be true) but
+ * vary the allowed outcomes of B (false vs. non-truth), while for implications
+ * we vary both A (truth vs. non-falsity) and B (truth vs. non-falsity).

Put another way in the comments I added in test_predtest.c:

+ /* Because weak refutation proofs are a strict subset of strong refutation
+ * proofs (since for "A => B" "A" is always true) we ought never
have strong
+ * refutation hold when weak refutation does not.
+ *
+ * We can't make the same assertion for implication since moving
from strong
+ * to weak implication expands the allowed values of "A" from
true to either
+ * true or NULL.

We could decide to handle this particular failing case explicitly in
predicate_refuted_by_simple_clause as opposed to inferring it by
whether or not implication by the not-arg holds, but I suspect that
leaves us open to other cases we should be to prove refutation for but
don't.

Alternatively (to avoid unnecessary CPU burn) we could modify
predicate_implied_by_recurse (and functionals called by it) to have a
argument beyond "weak = true/false" Ie.g., an enum that allows for
something like "WEAK", "STRONG", and "EITHER". That's a bigger change,
so I didn't want to do that right away unless there was agreement on
that direction.

I haven't changed this yet pending this discussion.

> Moreover, in order to mask such a bug,
> you're doubling the time taken by failed proofs, which is an
> unfortunate thing --- we don't like spending a lot of time on
> something that fails to improve the plan.

See above.

> @@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause,
> Assert(list_length(op->args) == 2);
> rightop = lsecond(op->args);
>
> - /*
> - * We might never see a null Const here, but better check
> - * anyway
> - */
> - if (rightop && IsA(rightop, Const) &&
> - !((Const *) rightop)->constisnull)
> + if (rightop && IsA(rightop, Const))
> {
> + Const *constexpr = (Const *) rightop;
> Node *leftop = linitial(op->args);
>
> - if (DatumGetBool(((Const *) rightop)->constvalue))
> - {
> - /* X = true implies X */
> - if (equal(predicate, leftop))
> - return true;
> - }
> + if (constexpr->constisnull)
> + return false;
> +
> + if (DatumGetBool(constexpr->constvalue))
> + return equal(predicate, leftop);
> else
> - {
> - /* X = false implies NOT X */
> - if (is_notclause(predicate) &&
> - equal(get_notclausearg(predicate), leftop))
> - return true;
> - }
> + return is_notclause(predicate) &&
> + equal(get_notclausearg(predicate), leftop);
> }
> }
> }
> break;
>
> I don't understand what this bit is doing ... and the fact that
> the patch removes all the existing comments and adds none isn't
> helping that. What it seems to mostly be doing is adding early
> "return false"s, which I'm not sure is a good thing, because
> it seems possible that operator_predicate_proof could apply here.

I was mostly bringing it in line with the style I have elsewhere in
the patch by pulling out the Const* into a variable to avoid repeated
casting.

That being said, you're right that I didn't catch in the many
revisions along the way that I'd added unnecessary early returns and
lost the comments. Fixed both of those in the next version.

> + case IS_UNKNOWN:
> + /*
> + * When the clause is in the form "foo IS UNKNOWN" then
> + * we can prove weak implication of a predicate that
> + * is strict for "foo" and negated. This doesn't work
> + * for strong implication since if "foo" is "null" then
> + * the predicate will evaluate to "null" rather than
> + * "true".
> + */
>
> The phrasing of this comment seems randomly inconsistent with others
> making similar arguments.

Changed.

> + case IS_TRUE:
> /*
> - * If the predicate is of the form "foo IS NOT NULL",
> - * and we are considering strong implication, we can
> - * conclude that the predicate is implied if the
> - * clause is strict for "foo", i.e., it must yield
> - * false or NULL when "foo" is NULL. In that case
> - * truth of the clause ensures that "foo" isn't NULL.
> - * (Again, this is a safe conclusion because "foo"
> - * must be immutable.) This doesn't work for weak
> - * implication, though. Also, "row IS NOT NULL" does
> - * not act in the simple way we have in mind.
> + * X implies X is true
> + *
> + * We can only prove strong implication here since
> + * `null is true` is false rather than null.
> */
>
> This hardly seems like an improvement on the comment. (Also, here and
> elsewhere, could we avoid using two different types of quotes?)

I think the git diff is confusing here. The old comment was about a
predicate "foo IS NOT NULL", but the new comment is about a predicate
"foo IS TRUE".

I did fix the usage of backticks though.

> + /* X is unknown weakly implies X is not true */
> + if (weak && clausebtest->booltesttype == IS_UNKNOWN &&
> + equal(clausebtest->arg, predbtest->arg))
> + return true;
>
> Maybe I'm confused, but why is it only weak?

You're not confused; this seems like a mistake (same with the IS NOT
FALSE below it).

> + /*
> + * When we know what the predicate is in the form
> + * "foo IS UNKNOWN" then we can prove strong and
> + * weak refutation together. This is because the
> + * limits imposed by weak refutation (allowing
> + * "false" instead of just "null") is equivalently
> + * helpful since "foo" being "false" also refutes
> + * the predicate. Hence we pass weak=false here
> + * always.
> + */
>
> This comment doesn't make sense to me either.

I rewrote the comment in the attached revision; let me know if that helps.

> + /* TODO: refactor this into switch statements also? */
>
> Let's drop the TODO comments.

This one was meant to be a question for you in review: do we want to
make that change? Or are we content to leave it as-is?

Either way, removed.

> + /*
> + * We can recurse into "not foo" without any additional processing because
> + * "not (null)" evaluates to null. That doesn't work for allow_false,
> + * however, since "not (false)" is true rather than null.
> + */
> + if (is_notclause(clause) &&
> + clause_is_strict_for((Node *) get_notclausearg(clause), subexpr, false))
> + return true;
>
> Not exactly convinced by this. The way the comment is written, I'd
> expect to not call clause_is_strict_for at all if allow_false. If
> it's okay to call it anyway and pass allow_false = false, you need
> to defend that, which this comment isn't doing.

I updated the comment to clarify. The restriction on allow_false
(always passing along false on the recursion case) is already
documented as a requirement in the function comment, but I wanted the
comment here to explain why that was necessary here, since in my
opinion it's not immediately obvious reading the function comment why
such a restriction would necessarily hold true for all recursion
cases.

Regards,
James Coleman

Attachment Content-Type Size
v6-0002-Teach-predtest.c-about-BooleanTest.patch application/octet-stream 51.1 KB
v6-0003-Add-temporary-all-permutations-test.patch application/octet-stream 32.9 KB
v6-0001-Use-switch-statements-in-predicate_-implied-refut.patch application/octet-stream 14.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-01-25 01:17:31 Re: LLVM 18
Previous Message Michael Paquier 2024-01-25 00:38:23 Re: Sequence Access Methods, round two