From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Bug in GiST paring heap comparator |
Date: | 2019-09-02 06:28:02 |
Message-ID: | 7362fcac-3664-d960-f0fa-77c24e62a7b3@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/09/2019 07:54, Alexander Korotkov wrote:
> Andrey Borodin noticed me that results of some KNN-GIST tests are
> obviously wrong and don't match results of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (10,10)
> (NaN,NaN)
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (-5,-12)
> (5.1,34.5)
>
> (1e+300,Infinity)
> (10 rows)
So we've memorized incorrect results in the expected output. Ouch!
> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN. It used plain float
> comparison, which doesn't handle Inf and Nan values well. Attached
> patch replaced it with float8_cmp_internal().
>
> Also, note that with patch KNN results still don't fully match results
> of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (10,10)
> (-5,-12)
> (5.1,34.5)
> (1e+300,Infinity)
>
> (NaN,NaN)
> (10 rows)
>
> NULL and '(NaN,NaN)' are swapped. It happens so, because we assume
> distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> be greater than NULL. If even we would assume distance to NULL to be
> Nan, it doesn't guarantee that NULLs would be last. It looks like we
> can handle this only by introduction array of "distance is null" flags
> to GISTSearchItem. But does it worth it?
I don't think we have much choice, returning wrong results is not an
option. At first I thought we could set the "recheck" flag if we see
NULLs or NaNs, but that won't work because rechecking is not supported
with Index-Only Scans.
Looking at the corresponding SP-GiST code,
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest
copy-pasting the implementation from there, so that they would be as
similar as possible. (SP-GiST gets away with just one isNull flag,
because it doesn't support multi-column indexes. In GiST, you need an
array, as you said.)
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2019-09-02 06:32:22 | pg_dump --exclude-* options documentation |
Previous Message | Andrey Borodin | 2019-09-02 05:11:42 | Re: Bug in GiST paring heap comparator |