Re: Have I found an interval arithmetic bug?

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane PostgreSQL <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, John W Higgins <wishdev(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Have I found an interval arithmetic bug?
Date: 2021-07-22 01:39:26
Message-ID: 654FEA5F-79CF-414A-9801-F1482EEBCD88@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> On 21-Jul-2021, at 17:07, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Wed, Jul 21, 2021 at 01:29:49PM -0400, Tom Lane wrote:
>> Bryn Llewellyn <bryn(at)yugabyte(dot)com> writes:
>>> It was me that started the enormous thread with the title “Have I found an interval arithmetic bug?” on 01-Apr-2021. I presented this testcase:
>>
>>>> select interval '-1.7 years'; -- -1 years -8 mons
>>>>
>>>> select interval '29.4 months'; -- 2 years 5 mons 12 days
>>>>
>>>> select interval '-1.7 years 29.4 months'; -- 8 mons 12 days << wrong
>>>> select interval '29.4 months -1.7 years'; -- 9 mons 12 days
>>>>
>>>> select interval '-1.7 years' + interval '29.4 months'; -- 9 mons 12 days
>>>> select interval '29.4 months' + interval '-1.7 years'; -- 9 mons 12 days
>>
>>> The consensus was that the outcome that I flagged with “wrong” does indeed have that status.
>>
>> Yeah, I think it's self-evident that your last four cases should
>> produce the same results. Whether '9 mons 12 days' is the best
>> possible result is debatable --- in a perfect world, maybe we'd
>> produce '9 mons' exactly --- but given that the first two cases
>> produce what they do, that does seem self-consistent. I think
>> we should be setting out to fix that outlier without causing
>> any of the other five results to change.
>
> OK, I decided to reverse some of the changes I was proposing once I
> started to think about the inaccuracy of not spilling down from 'weeks'
> to seconds when hours also appear. The fundamental issue is that the
> months-to-days conversion is almost always an approximation, while the
> days to seconds conversion is almost always accurate. This means we are
> never going to have consistent spill-down that is useful.
>
> Therefore, I went ahead and accepted that years and larger units spill
> only to months, months spill only to days, and weeks and lower spill all
> the way down to seconds. I also spelled this out in the docs, and
> explained why we have this behavior.
>
> Also, with my patch, the last four queries return the same result
> because of the proper rounding also added by the patch, attached.

Your statement

“months-to-days conversion is almost always an approximation, while the days to seconds conversion is almost always accurate.”

is misleading. Any conversion like these (and also the “spill up” conversions that the justify_hours(), justify_days(), and justify_interval() built-in functions bring) are semantically dangerous because of the different rules for adding a pure months, a pure days, or a pure seconds interval to a timestamptz value.

Unless you avoid mixed interval values, then it’s so hard (even though it is possible) to predict the outcomes of interval arithmetic. Rather, all you get is emergent behavior that I fail to see can be relied upon in deliberately designed application code. Here’s a telling example:

set timezone = 'America/Los_Angeles';
with
c as (
select
'2021-03-13 19:00:00 America/Los_Angeles'::timestamptz as d,
'25 hours'::interval as i)
select
d + i as "d + i",
d + justify_hours(i) as "d + justify_hours(i)"
from c;

This is the result:

d + i | d + justify_hours(i)
------------------------+------------------------
2021-03-14 21:00:00-07 | 2021-03-14 20:00:00-07

The two results are different, even though the native equality test shows that the two different interval values are the same:

with
c as (select '25 hours'::interval as i)
select (i = justify_hours(i))::text
from c;

The result is TRUE.

The only route to sanity is to use only pure interval values (i.e. where only one of the fields of the internal [mm, dd, ss] representation is non-zero.

I mentioned that you can use a set of three domain types to enforce your intended practice here.

In other words, by programming application code defensively, it’s possible to insulate oneself entirely from the emergent behavior of the decades old PG code that implements the unconstrained native interval functionality and that brings what can only be considered to be unpredictable results.

Moreover, this defensive approach insulates you from any changes that Bruce’s patch might make.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2021-07-22 01:43:31 Re: Have I found an interval arithmetic bug?
Previous Message Bruce Momjian 2021-07-22 00:07:13 Re: Have I found an interval arithmetic bug?

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-07-22 01:43:31 Re: Have I found an interval arithmetic bug?
Previous Message houzj.fnst@fujitsu.com 2021-07-22 00:53:45 RE: [PATCH] Use optimized single-datum tuplesort in ExecSort