Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Date: 2023-10-19 09:01:44
Message-ID: a4d5563f-b62a-61aa-8022-12eed3fbcf21@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/19/23 06:32, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>>
>> I did use that many values to actually force "compaction" and merging of
>> points into ranges. We only keep 32 values per page range, so with 2
>> values we'll not build a range. You're right it may still trigger the
>> overflow (we probably still calculate distances, I didn't realize that),
>> but without the compaction we can't check the query plans.
>>
>> However, I agree 60 values may be a bit too much. And I realized we can
>> reduce the count quite a bit by using the values_per_range option. We
>> could set it to 8 (which is the minimum).
>>
>
> I haven't read BRIN code, except the comments in the beginning of the
> file. From what you describe it seems we will store first 32 values as
> is, but later as the number of values grow create ranges from those?
> Please point me to the relevant source code/documentation. Anyway, if
> we can reduce the number of rows we insert, that will be good.
>

I don't think we have documentation other than what's at the beginning
of the file. What the comment tries to explain is that the summary has a
maximum size (32 values by default), and each value can be either a
point or a range. A point requires one value, range requires two. So we
accumulate values one by one - until 32 values that's fine. Once we get
33, we have to merge some of the points into ranges, and we do that in a
greedy way by distance.

For example, this may happen:

33 values
-> 31 values + 1 range [requires 33]
-> 30 values + 1 range [requires 32]
...

The exact steps depend on which values/ranges are picked for the merge,
of course. In any case, there's no difference between the initial 32
values and the values added later.

Does that explain the algorithm? I'm not against clarifying the comment,
of course.

>>>
>>
>> Not sure what you mean by "crash". Yes, it doesn't hit an assert,
>> because there's none when calculating distance for date. It however
>> should fail in the query plan check due to the incorrect order of
>> subtractions.
>>
>> Also, the commit message does not claim to fix overflow. In fact, it
>> says it can't overflow ...
>>
>
>
> Reading the commit message
> "Tests for overflows with dates and timestamps in BRIN ...
>
> ...
>
> The new regression tests check this for date and timestamp data types.
> It adds tables with data close to the allowed min/max values, and builds
> a minmax-multi index on it."
>
> I expected the CREATE INDEX statement to throw an error or fail the
> "Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
> later commit mentions that the overflow is not possible.
>

Hmmm, yeah. The comment should mention the date doesn't have issue with
overflows, but other bugs.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-10-19 09:05:45 Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Previous Message Michael Paquier 2023-10-19 08:46:02 Re: pgsql: Generate automatically code and documentation related to wait ev