Re: PATCH: Using BRIN indexes for sorted output

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Using BRIN indexes for sorted output
Date: 2023-07-14 14:42:43
Message-ID: CAEze2WhiBeTOnwy8iDVPTb_joZ9-LDR_7DA0xEcZBUeyL_otOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 14 Jul 2023 at 16:21, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
>
> On 7/11/23 13:20, Matthias van de Meent wrote:
>> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>> Approximation in what sense? My understanding was we'd get a range of
>>> distances that we know covers all rows in that range. So the results
>>> should be accurate, no?
>>
>> The distance is going to be accurate only to the degree that the
>> summary can produce accurate distances for the datapoints it
>> represents. That can be quite imprecise due to the nature of the
>> contained datapoints: a summary of the points (-1, -1) and (1, 1) will
>> have a minimum distance of 0 to the origin, where the summary (-1, 0)
>> and (-1, 0.5) would have a much more accurate distance of 1.
>
> Ummm, I'm probably missing something, or maybe my mental model of this
> is just wrong, but why would the distance for the second summary be more
> accurate? Or what does "more accurate" mean?
>
> Is that about the range of distances for the summary? For the first
> range the summary is a bounding box [(-1,1), (1,1)] so all we know the
> points may have distance in range [0, sqrt(2)]. While for the second
> summary it's [1, sqrt(1.25)].

Yes; I was trying to refer to the difference between what results you
get from the summary vs what results you get from the actual
datapoints: In this case, for finding points which are closest to the
origin, the first bounding box has a less accurate estimate than the
second.

> > The point I was making is that the summary can only approximate the
> > distance, and that approximation is fine w.r.t. the BRIN sort
> > algoritm.
> >
>
> I think as long as the approximation (whatever it means) does not cause
> differences in results (compared to not using an index), it's OK.

Agreed.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-07-14 14:47:33 Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Previous Message Tomas Vondra 2023-07-14 14:31:49 Re: logical decoding and replication of sequences, take 2