Re: Unexpected interval comparison

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: adrian(dot)klaver(at)aklaver(dot)com, frazer(at)frazermclean(dot)co(dot)uk, pgsql-general(at)postgresql(dot)org
Subject: Re: Unexpected interval comparison
Date: 2017-04-05 11:07:23
Message-ID: 20170405.200723.102041631.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Mmm. It's shameful.

At Tue, 04 Apr 2017 18:06:38 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <5084(dot)1491343598(at)sss(dot)pgh(dot)pa(dot)us>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > The first attached is the revised patch and the second is
> > temporary sanity check code for non-128bit environment code. (but
> > works only on 128 bit environment)
>
> This seemed to me to be probably even less correct, so I extracted
> the addition and multiplication logic into a standalone test program
> (attached), which compares the result of a multiplication to that
> of native int128 arithmetic. I changed the order of the LinearInterval
> fields to be LS-first so that I could overlay them onto an int128
> result (on a little-endian machine); this is just for testing purposes
> not something we must do in the finished code. I soon found cases
> where it indeed fails, eg
>
> $ ./a.out 0x7ffffffff 0x7ffffffff
> 7FFFFFFFF * 7FFFFFFFF
> result = 62 18446744004990074881
> result = 3E FFFFFFF000000001
> MISMATCH!
> result = 63 18446744004990074881
> result = 3F FFFFFFF000000001

I admit that I was careless about that.

> After fooling with it for awhile, I decided that the cause of the
> problems was basically not thinking carefully about the lower half
> of the value being unsigned: that affects when to do carries in
> the addition macro, and we also have to be careful about whether
> or not to sign-extend the partial product terms. The second
> attached file is a version that I can't break anymore, though I'm
> not quite sure it's bug-free.

In the first version, I converted all operands to positive
numbers and finally correct the sign of the result. This was
straightforward but doesn't looked clean so I tried direct
handling of singed values. It was the beginning of this mess.

There was a time when the lower bits is in uint64 but I was
confused by arithmetics mixing signed and unsigned. I realized
that I misunderstood composing 64 bit value from decomposition of
a negative int64.

I reconsidered this based on Tom's test program and try to give
reasoning for the algorithm of the attached new version.

1. Now INT64_AL32 returns explicitly casted into uint64 (for
safety). the upper and lower 32 bit values surely makes the
original value just by INT64_AU32 << 32 + INT64_AL32 because

1.1. The arithmetic is done assuming that both operands of the
addition are in signed int64 but the MSB of the right
operand is always 0 so no difference by reading it as
singed.

- As mentioned in added comment, all terms (stored in the
variable tmp) are the products of two signed/unsigned 32-bit
values expanded to singed int64 variables. This is safe.

- The second and third terms should be left-shifted by 32 bit in
virtually-128-bit storage then added to exiting 128 bit
value. This can be said as adding INT128_AU64(tmp<<32) into hi
part. If INT128_AU64(tmp<<32) is equivalent to
INT64_AU32(tmp)>>32, "span.hi += INT64_AU32(tmp)" is safe.

INT128_AU64(tmp << 32) ; tmp is assumed signed int128 here
= ((tmp << 32) >> 64)
= tmp >>32
= INT64_AU32(tmp) ; here, tmp is safe even if singed int64

Similary,

INT128_AL64(tmp << 32)
= (uint128)(tmp << 32) & 0xFFFFFFFF_FFFFFFFF (lower 32 bits are always 0)
= ((uint64)(tmp) & 0xFFFFFFFF) << 32
= INT64_AL32(tmp) << 32

- The last thing I should confirm is that
LINEARINTERVAL_ADD_UINT64 is correct. This is analogous to 1
above.

(int128)x + (uint64)y
= (int128)x + (int128)y ; safely extended without change
= (INT128AU64(x) << 64 + INT128AL64(x))
+ (INT128AU64(y) << 64 + INT128AL64(y))
= (INT128AU64(x) + INT128AU64(y)) << 64 ; performed as signed
+ (INT128AL64(x) + INT128AL64(y)) ; performed as unsigned

Where (INT128AL64(x) + INT128AL64(y)) is performed as unsigned
and can be overflow, and the carry can be just pushed to the
upper 64bit.

2. Adding two values with MSB of 0 doesn't overflow.

3. If at least one of the two has MSB of 1, it can be overflow.

3.1. Both have MSB of 1, it must overflow.

3.2. If one of the two has MSB of 1, the maximum overflowed
value is made by 0xFFF...FF + 0x7FF...FF and result is
0x(1)7FF..FE so "MSB of the result is 0" and "overflowed" is
equivalent for the case.

Addition to all of the above, dayfraction should be positive so
that LINEARINTERVAL_ADD_UINT64 can be used.

The attached patch is the revised version.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-overflow-during-interval-comparison_20170405.patch text/x-patch 10.4 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Moran 2017-04-05 11:54:33 Re: Keycloak and Postgres
Previous Message Albe Laurenz 2017-04-05 10:34:22 Re: expensive function in select list vs limit clause