infinite histogram bounds and nan (Re: comment regarding double timestamps; and, infinite timestamps and NaN)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: infinite histogram bounds and nan (Re: comment regarding double timestamps; and, infinite timestamps and NaN)
Date: 2020-01-02 13:55:39
Message-ID: 20200102135538.GB12890@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> > On v12, my test gives:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1)
>
> This is what it should do. There's only one histogram bucket, and
> it extends down to -infinity, so the conclusion is going to be that
> the WHERE clause excludes all but a small part of the bucket. This
> is the correct answer based on the available stats; the problem is
> not with the calculation, but with the miserable granularity of the
> available stats.
>
> > vs patched master:
> > |DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
> > |INSERT INTO t VALUES('-infinity');
> > |ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
> > |explain analyze SELECT * FROM t WHERE t>='2010-12-29';
> > | Seq Scan on t (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1)
>
> This answer is simply broken. You've caused it to estimate half
> of the bucket, which is an insane estimate for the given bucket
> boundaries and WHERE constraint.
>
> > IMO 146 rows is a reasonable estimate given a single histogram bucket of
> > infinite width,
>
> No, it isn't.

When using floats, v12 also returns half the histogram:

DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(0, 99, 1)::float;
INSERT INTO t VALUES('-Infinity');
ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
explain analyze SELECT * FROM t WHERE t>='50';
Seq Scan on t (cost=0.00..2.26 rows=51 width=8) (actual time=0.014..0.020 rows=50 loops=1)

I'm fine if the isnan() logic changes, but the comment indicates it's intended
to be hit for an infinite histogram bound, but that doesn't work for timestamps
(convert_to_scalar() should return (double)INFINITY and not
(double)INT64_MIN/MAX).

On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> > On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> >> Uh, what? This seems completely wrong to me. We could possibly
> >> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> >> I don't really see the point. They'll compare to other timestamp
> >> values correctly without that, cf timestamp_cmp_internal().
> >> The example you give seems to me to be working sanely, or at least
> >> as sanely as it can given the number of histogram points available,
> >> with the existing code. In any case, shoving NaNs into the
> >> computation is not going to make anything better.
>
> > As I see it, the problem is that the existing code tests for isnan(), but
> > infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
> > absurdly large values being used as if they were isnormal().
>
> I still say that (1) you're confusing NaN with Infinity, and (2)
> you haven't actually shown that there's a problem to fix.
> These endpoint values are *not* NaNs.

I probably did confuse it while trying to make the behavior match the comment
for timestamps.
The Subject says NAN since isnan(binfrac) is what's supposed to be hit for that
case.

The NAN is intended to come from:

|binfrac = (val - low) / (high - low);

which is some variation of -inf / inf.

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-02 13:56:59 Re: Removal of support for OpenSSL 0.9.8 and 1.0.0
Previous Message Tom Lane 2020-01-02 13:52:17 Re: Decade indication