Re: Improve selectivity estimate for range queries

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve selectivity estimate for range queries
Date: 2019-01-09 02:55:57
Message-ID: 20190109.115557.33860764.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 08 Jan 2019 16:26:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190108(dot)162638(dot)106314087(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello.
>
> At Fri, 21 Dec 2018 11:50:28 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <28533(dot)1545411028(at)sss(dot)pgh(dot)pa(dot)us>
> > seem that that's just moving the problem around, but I think it
> > might be possible to show that such a value couldn't be computed
> > by scalarltsel given a histogram with no more than 10000 members.
> > (I haven't tried to actually prove that, but it seems intuitive
> > that the set of possible results would be quantized with no more
> > than about 5 digits precision.)

I think we don't need a perfect proof for that. The fact that
exactly 1/3 is quite natural and common but 1/3 + ε is not would
be enough.

> FWIW, I got the following result on my environment. It seems
> different enough if this holds on all supported platforms, though
> there still is a case where the result of a sequence of
> arithmetics makes false match.

Simple selectivity of a relation theoretically cannot match with
the epsilon. (Of couse on *my* environment.)

(0.333..)
binary format: 3f d5 55 55 55 55 55 55
x = 0.333333333333333315
231 matches, 79 no_matches

(0.3{13}42..)
binary format: 3f d5 55 55 55 55 55 f1
x = 0.333333333333341975
0 matches, 310 no_matches

(0.3{15}42..)
binary format: 3f d5 55 55 55 55 55 57
x = 0.333333333333333426
0 matches, 310 no_matches

It seems that 0.3{13}42 is correctly 0.3{15}42, which makes just
two LSBs difference from 1/3. I believe C is well standardized on
the translation. Other DEFAULT_*_SELs are not compared in this
way.

The attached small patch fixes the case mentioned in this thread,
but I'm not sure where to put a test. Static assertion is not
usable. Assertion in somewhere perhaps in clauselist_selectivity
seems somewhat overdone.. I don't find a suitable place in
regression test..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
avoid_false_match_with_default_ineq_sel.patch text/x-patch 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-01-09 03:24:48 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Imai, Yoshikazu 2019-01-09 02:08:50 RE: speeding up planning with partitions