Re: Possible Performance Regression with Transitive Comparisons vs. Constants

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: sthomas(at)optionshouse(dot)com
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Possible Performance Regression with Transitive Comparisons vs. Constants
Date: 2012-09-28 22:25:57
Message-ID: 16678.1348871157@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Shaun Thomas <sthomas(at)optionshouse(dot)com> writes:
> The first part of the question stands, though... Why isn't the optimizer
> substituting these values? a.created_date should be exactly equivalent
> to '2012-05-05', but it's clearly not being treated that way.

No version of Postgres has ever substituted constants in the way you're
imagining, and I wouldn't hold my breath waiting for it to happen. The
reason is that "x = constant" only creates a requirement for x to be
btree-equal to the constant, and btree equality doesn't guarantee
equality for all purposes. In this example we'd have to assume that
btree-equality guaranteed identical results from the date + interval
addition operator. While that happens to be true for this operator,
the planner can't know that.

A real-world example of the kind of case I'm worried about is that in
IEEE-spec float arithmetic, minus zero and plus zero compare equal ---
but there are functions that give different results for the two values.
Another is that the char(n) type's equality operator will say that
'foo' and 'foo ' are equal, but those values are definitely
distinguishable by some operations, eg length().

There are some cases where the planner can effectively propagate
constants, but they rely on transitivity of btree equality operators.
For instance if we have x = constant and x = y, with compatible equality
operators, we can deduce y = constant. But that doesn't imply that y
*is* the constant, just that it's btree-equal to it.

There have been some discussions of inventing a stronger notion of
equality than btree equality, so that we could know when it's safe to
make this type of substitution; but nothing's been done about that.
Personally I think it's fairly rare that any real win would come from
this type of constant substitution, and so it's very likely that adding
it would just create a net drag on performance (because of the added
planner cycles spent looking for substitution opportunities, which would
happen in every query whether it got any benefit or not).

Another point here is that at least for the one side of your BETWEEN
operator, b.created_dt >= a.created_dt, we could in fact combine that
with a.created_dt = '2012-05-05' to deduce b.created_dt >= '2012-05-05',
because we know from the btree opclass for dates that these = and >=
operators have compatible semantics. Again though, it seems likely that
the cost of looking for such opportunities would outweigh the benefits.
In this particular example I don't think it'd do much good --- the
reason the planner isn't picking a plan similar to the "fast" one is
that it doesn't know that the BETWEEN with variable limits will select
only a relatively small part of the table. Providing a constant limit
for just one side wouldn't fix that.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Daw 2012-09-28 22:56:10 Re: Query plan, nested EXISTS
Previous Message Matt Daw 2012-09-28 21:47:47 Re: Query plan, nested EXISTS