Re: comment regarding double timestamps; and, infinite timestamps and NaN

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: comment regarding double timestamps; and, infinite timestamps and NaN
Date: 2019-12-30 15:18:29
Message-ID: 20191230151829.GP12890@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 30, 2019 at 09:05:24AM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> > That seems to be only used for ineq_histogram_selectivity() interpolation of
> > histogram bins. It looks to me that at least isn't working for "special
> > values", and needs to use something other than isnan().
>
> Uh, what? This seems completely wrong to me. We could possibly
> promote DT_NOBEGIN and DT_NOEND to +/- infinity (not NaN), but
> I don't really see the point. They'll compare to other timestamp
> values correctly without that, cf timestamp_cmp_internal().
> The example you give seems to me to be working sanely, or at least
> as sanely as it can given the number of histogram points available,
> with the existing code. In any case, shoving NaNs into the
> computation is not going to make anything better.

As I see it, the problem is that the existing code tests for isnan(), but
infinite timestamps are PG_INT64_MIN/MAX (here, stored in a double), so there's
absurdly large values being used as if they were isnormal().

src/include/datatype/timestamp.h:#define DT_NOBEGIN PG_INT64_MIN
src/include/datatype/timestamp.h-#define DT_NOEND PG_INT64_MAX

On v12, my test gives:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t (cost=0.00..5.62 rows=3 width=8) (actual time=0.012..0.042 rows=289 loops=1)

vs patched master:
|DROP TABLE t; CREATE TABLE t(t) AS SELECT generate_series(now(), now()+'1 day', '5 minutes');
|INSERT INTO t VALUES('-infinity');
|ALTER TABLE t ALTER t SET STATISTICS 1; ANALYZE t;
|explain analyze SELECT * FROM t WHERE t>='2010-12-29';
| Seq Scan on t (cost=0.00..5.62 rows=146 width=8) (actual time=0.048..0.444 rows=289 loops=1)

IMO 146 rows is a reasonable estimate given a single histogram bucket of
infinite width, and 3 rows is a less reasonable result of returning INT64_MAX
in one place and then handling it as a normal value. The comments in
ineq_histogram seem to indicate that this case is intended to get binfrac=0.5:

| Watch out for the possibility that we got a NaN or Infinity from the
| division. This can happen despite the previous checks, if for example "low" is
| -Infinity.

I changed to use INFINITY, -INFINITY and !isnormal() rather than nan() and
isnan() (although binfrac is actually NAN at that point so the existing test is
ok).

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2019-12-30 15:57:06 Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Previous Message Fabien COELHO 2019-12-30 14:40:14 Re: TAP testing for psql's tab completion code