Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN minmax multi - incorrect distance for infinite timestamp/date
Date: 2023-10-18 10:47:40
Message-ID: CAExHW5vi3Yjk_5vvwtZWrvy-ZWbx68hbET_GpyH5fjS=2KMZrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.

It will be good to commit the test changes as well.

>
> In particular, this deals with these issues:
>
> 1) overflows in distance calculation for large timestamp values (0002)

I could reduce the SQL for timestamp overflow test to just
-- test overflows during CREATE INDEX with extreme timestamp values
CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);

SET datestyle TO iso;

INSERT INTO brin_timestamp_test VALUES
('4713-01-01 00:00:30 BC'),
('294276-12-01 00:00:01');

CREATE INDEX ON brin_timestamp_test USING brin (a
timestamptz_minmax_multi_ops) WITH (pages_per_range=1);

I didn't understand the purpose of adding 60 odd values to the table.
It didn't tell which of those values triggers the overflow. Minimal
set above is much easier to understand IMO. Using a temporary table
just avoids DROP TABLE statement. But I am ok if you want to use
non-temporary table with DROP.

Code changes in 0002 look fine. Do we want to add a comment "cast to a
wider datatype to avoid overflow"? Or is that too explicit?

The code changes fix the timestamp issue but there's a diff in case of

>
> 2) incorrect subtraction in distance for date values (0003)

The test case for date brin index didn't crash though. Even after
applying 0003 patch. The reason why date subtraction can't overflow is
a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
because of the code below
#define IS_VALID_DATE(d) \
((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
(d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
This prevents the lower side to be well within the negative int32
overflow threshold and we always subtract higher value from the lower
one. May be good to elaborate this? A later patch does use float 8
casting eliminating "any" possibility of overflow. So the comment may
not be necessary after squashing all the changes.

>
> 3) incorrect distance for infinite date/timestamp values (0005)

The tests could use a minimal set of rows here too.

The code changes look fine and fix the problem seen with the tests alone.

>
> 4) failing distance for extreme interval values (0007)

I could reproduce the issue with a minimal set of values
-- test handling of overflow for interval values
CREATE TABLE brin_interval_test(a INTERVAL);

INSERT INTO brin_interval_test VALUES
('177999985 years'),
('-178000000 years');

CREATE INDEX ON brin_interval_test USING brin (a
interval_minmax_multi_ops) WITH (pages_per_range=1);
DROP TABLE brin_interval_test;

The code looks fine and fixed the issue seen with the test.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

>
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary. People with multi-minmax indexes on "date" columns
> probably will need to reindex.
>

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-10-18 10:54:10 Re: Synchronizing slots from primary to standby
Previous Message Alexander Korotkov 2023-10-18 10:34:49 Re: On login trigger: take three