Re: Bug in to_timestamp().

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in to_timestamp().
Date: 2016-06-20 12:58:23
Message-ID: CAKFQuwbyB0w2_g9y5FGebBMUU66v-v_BqRyV29tSwG2rbWJb2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 20, 2016 at 8:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> amul sul <sul_amul(at)yahoo(dot)co(dot)in> writes:
> >>> It's look like bug in to_timestamp() function when format string has
> more whitespaces compare to input string, see below:
> >>
> >> No, I think this is a case of "input doesn't match the format string".
> >>
> >> As a rule of thumb, using to_timestamp() for input that could be parsed
> >> just fine by the standard timestamp input function is not a particularly
> >> good idea. to_timestamp() is meant to deal with input that is in a
> >> well-defined format that happens to not be parsable by timestamp_in.
> >> This example doesn't meet either of those preconditions.
> >
> > I think a space in the format string should skip a whitespace
> > character in the input string, but not a non-whitespace character.
> > It's my understanding that these functions exist in no small part for
> > compatibility with Oracle, and Oracle declines to skip the digit '1'
> > on the basis of an extra space in the format string, which IMHO is the
> > behavior any reasonable user would expect.
>
> So Amul and I are of one opinion and Tom is of another. Anyone else
> have an opinion?
>
>
​At least Tom's position has the benefit of being consistent with current
behavior. The current implementation doesn't actually care what literal
value you specify - any non-special character consumes a single token from
the input, period.

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD--HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD-HH24:MI:SS');

Both the above exhibit the same behavior as if you used a space instead of
the hyphen as the group separator.

The documentation should be updated to make this particular dynamic more
clear.

I don't see changing the general behavior of these "date formatting"
functions a worthwhile endeavor.​ Adding a less-liberal "parse_timestamp"
function I could get behind.

IOW, the user seems to be happy with the fact that the "/" in the date
matches his "-" but them complains that the space matches the number "1".
You don't get to have it both ways.

[re-reads the third usage note]

Or maybe you do. We already define space as a being a greedy operator
(when not used in conjunction with FX). A greedy space-space sequence
makes little sense on its face and if we are going to be helpful here we
should treat it as a single greedy space matcher.

Note that "returns an error because to_timestamp expects one space only" is
wrong - it errors because only a single space is captured and then the
attempt to parse ' JUN' using "MON" fails. The following query doesn't
fail though it exhibits the same space discrepancy (it just gives the same
"wrong" result).

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'FXYYYY/MM/DD HH24:MI:SS');

Given that we already partially special-case the space expression I'd be
inclined to consider Robert's and Amul's position on the matter. I think
I'd redefine our treatment of space to be "zero or more" instead of "one or
more" and require that it only match a literal space in the input.

Having considered that, I'm not convinced its worth a compatibility break.
I'd much rather deprecate these <to_*> versions and write
slightly-less-liberal versions named <parse_*>.

In any case I'd called the present wording a bug. Instead:

A single space consumes a single token of input and then, unless the FX
modifier is present, consumes zero or more subsequent literal spaces.
Thus, using two spaces in a row without the FX modifier, while allowed, is
unlikely to give you a satisfactory result. The first space will consume
all available consecutive spaces so that the second space will be
guaranteed to consume a non-space token from the input.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-20 13:36:11 Re: Bug in to_timestamp().
Previous Message Robert Haas 2016-06-20 12:19:12 Re: Bug in to_timestamp().