From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Regina Obe <lr(at)pcorp(dot)us> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: What is "index returned tuples in wrong order" for recheck supposed to guard against? |
Date: | 2017-01-03 16:18:55 |
Message-ID: | CA+TgmoauhLf6R07sAUzQiRcstF5KfRw7nwiWn4VZgiSF8MaQaw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe <lr(at)pcorp(dot)us> wrote:
>> cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance.
>> The estimated distance can be less than the actual distance, but it isn't allowed to be more. See gist_bbox_distance for an example of a "lossy" distance calculation, and more generally "git show 35fcb1b3d038a501f3f4c87c05630095abaaadab".
>
> Did you mean would return < 0 ?
Yes, sorry.
> Since I thought 0 meant exact and not where it's Erroring?
>
> I think for points then maybe we should turn it off, as this could just be floating point issues with the way we compute the index.
> That would explain why it doesn't happen for other cases like polygon / point in our code
> or polygon /polygon in our code since the box box distance in our code would always be <= actual distance for those.
>
> So maybe the best course of action is just for us inspect the geometries and if both are points just disable recheck.
>
> It's still not quite clear to me even looking at that git commit, why those need to error instead of going thru recheck aside from efficiency.
The code that reorders the returned tuples assumes that (1) the actual
distance is always greater than or equal to the estimated distance and
(2) the index returns the tuples in order of increasing estimated
distance. Imagine that the estimated distances are 0, 1, 2, 3... and
the real distances are 2,3,4,5... When it sees the
estimated-distance-0 tuple it computes that the actual distance is 2,
but it doesn't know whether there's going to be a tuple later with an
actual distance between 0 and 2, so it buffers the tuple. When it sees
the estimated-distance-1 tuple it computes that the actual distance is
2, and now it knows there won't be any more estimated or actual
distances between 0 and 1, but there could still be a tuple with an
estimated distance of 1 and 2 whose actual distance is also between 1
and 2, so it buffers the second tuple as well. When it sees the third
tuple, with estimated distance 2, it now knows that there won't be any
further tuples whose estimated or actual distance is less than 2. So
now it can emit the first tuple that it saw, because that had an
actual distance of 2 and from this point forward the index will never
return anything with a smaller estimated or actual distance. The
estimated-distance-1 tuple still has to stay in the buffer, though,
until we see a tuple whose estimated distance is greater than that
tuple's actual distance (3).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2017-01-03 16:21:10 | Re: Proposal for changes to recovery.conf API |
Previous Message | Vladimir Rusinov | 2017-01-03 16:17:40 | Re: [PATCH] Rename pg_switch_xlog to pg_switch_wal |