Re: [HACKERS] [PATCH] Improve geometric types

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: emre(at)hasegeli(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, robertmhaas(at)gmail(dot)com, a(dot)alekseev(at)postgrespro(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [PATCH] Improve geometric types
Date: 2018-01-19 08:26:46
Message-ID: 20180119.172646.207924292.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Thu, 18 Jan 2018 16:01:01 +0100, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote in <CAE2gYzwgcan3w3=-3oHa4Efif=0yvoErn-e_V5KJLUi9pd8ivQ(at)mail(dot)gmail(dot)com>
> > 0001:
> >
> > - You removed the check of parallelity check of
> > line_interpt_line(old line_interpt_internal) but line_parallel
> > is not changed so the consistency between the two functions are
> > lost. It is better to be another patch file (maybe 0004?).
>
> I am making line_parallel() use line_interpt_line(). What they do is
> exactly the same.

Thanks.

> > - + Assert(p1->npts > 0 && p2->npts > 0);
> >
> > Other path_ functions are allowing path with no points so just
> > returning false if (p1->npts == 0 || p2->npts == 0) seems
> > enough. Assertion should be used only for something server
> > cannot continue working. (division by zero doesn't crash
> > server) I saw other similar assertions in the patch.
>
> path_in() and path_recv() reject paths with no points. I thought we
> shouldn't spend cycles for things that cannot happen. I can revert
> this part if you find previous practice better.

I'm fine with the current shape. Thanks. Maybe the same
discussion applies to polygons. (cf. poly_overlap)

> > 0003:
> >
> > close_pl/ps/lseg/pb/ls/sb have changed to return null when
> > lseg_closept_line returns NAN, but they are defined as strict
> > and that is reasonable. Anyway pg_hypot returns NaN only when
> > parameters contain INF or NAN so we should error out when it
> > returns nan.
>
> I though strict is only related to the input being NULL. Tom Lane

Oops. Yes. you're rigtht.

> made close_ps() return NULL with commit
> 278148907a971ec7fa4ffb24248103d8012155d2. The other functions

Thank you for the pointer. By the way,

https://www.postgresql.org/message-id/1919.1468269494%40sss.pgh.pa.us

| > Is it reasonable to disallow NaN coordinates on the next major
| > release. Are there any reason to deal with them?
|
| I don't see how we can do that; what would you do about tables already
| containing NaNs? Even without that consideration, assuming that there's
| no way a NaN could creep in seems a pretty fragile assumption, considering
| that operations like Infinity/Infinity will produce one.

Ok, it is convicing. The policy is don't do anything that is not
harmful to server. Currently close_* are *strict* so what we
should do is avoid returning '(anything *)NULL' as a result.

> currently fail with elog()s from DirectFunctionCalls. I don't have
> strong preference for NULL or an error. Could you please suggest an
> errcode and errmsg, if you have?

=# select close_sb('((nan,0),(1,1))'::lseg, '((-1,-1),(2,2))'::box);
ERROR: function 0x8fe61c returned NULL

Recosdering on it and I came to think that such usage of SQL null
is the same to "invalid" objects I mentioned upthread. But we
don't actively check component NaNs but if we happen to see an
invalid result, return SQL null instead.

In this criteria close_* functions looks good that return SQL
null.

0003:

line_closept_point asserts line_interpt_line returns true but it
is hazardous. It is safer if line_closept_point returns NaN if
line_interpt_line returns false.

All other functions looks good in the criteria.

> > I understand that we don't need to consider NAN is equal to NAN.
> > circle_same, point_eq_point, line_eq no longer needs such
> > change?
>
> I though it would be better to consider NaNs equal only on the
> equality operators. That is why only they are changed that way.

I'm convinced by that.

> > 0004:
> >
> > Looking line_recv's change, I became to think that FPxx macros
> > is provided for coordinate values. I don't think it is applied
> > to non-coordinate values like coefficients. If some kind of
> > tolerance is needed here, it is not the same with the EPSILON.
> >
> > + if (FPzero(line->A) && FPzero(line->B))
> > + ereport(ERROR,
> >
> > So, the above should be exact comparison with 0.0 and line_in
> > also should be the same. And maybe the same thing should be done
> > at many places..
>
> I agree you. EPSILON is currently applied prematurely. I am trying
> to stay away from EPSION related issues to improve the chances to get
> this part committed.

Agreed.

> > But the following line of line_parallel still requires some kind
> > of tolerance to work properly. Since this patch is an
> > imoprovement patch, I think we can change it to the vector method.
>
> I am making line_parallel() use line_interpt_line() in response to
> your first remark. Vector based algorithm is probably better for
> every use of line_interpt_line(), but it is a bigger change with more
> user visible effects.

I'm fine with that.

> > The problem of line_distance is an existing one so we can leave
> > it alone. It returns 0.0 for the most cases but it is a
> > long-standing behavior.. (Anyway I don't find a reasonable
> > definition of the distance between very-nearly parallel lines..)
>
> Exact comparison with 0.0 instead of FPzero() would also be an
> improvement for line_distance(), but I am not doing it now.

reagrds,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuto Hayamizu 2018-01-19 08:27:33 Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Previous Message Simon Riggs 2018-01-19 08:25:30 Re: Rangejoin rebased