From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, amul sul <sul_amul(at)yahoo(dot)co(dot)in>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Bug in to_timestamp(). |
Date: | 2018-08-02 15:17:01 |
Message-ID: | CAPpHfdtqOSniGJRvJ2zaaE8=eMB8XDnzvVS-9c3Xufaw=iPA+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 1, 2018 at 3:17 PM Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> wrote:
> On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:
> > Thank you, Arthur. These examples shows downside of this patch, where
> > users may be faced with incompatibility. But it's good that this
> > situation can be handled by altering format string. I think these
> > examples should be added to the documentation and highlighted in
> > release notes.
>
> I updated the documentation. I added a tip text which explains
> how to_timestamp() and to_date() handled ordinary text prior to
> PostgreSQL 11 and how they should handle ordinary text now.
>
> There is now changes in the code and tests.
Thank you, Arthur!
I made following observations on Oracle behavior.
1) Oracle also supports parsing TZH in to_timestamp_tz() function.
Previously, in order to handle TZH (which could be negative) we
decided to skip spaces before fields, but not after fields [1][2].
That leads to behavior, which is both incompatible with Oracle and
unlogical (at least in my opinion). But after exploration of
to_timestamp_tz() behavior I found that Oracle can distinguish minus
sign used as separator from minus sign in TZH field.
# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00
# select to_char(to_timestamp_tz('2018-01-01--10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00
# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00
# select to_char(to_timestamp_tz('2018-01-01---10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00
So, if number of spaces/separators between fields in input string is
more than in format string and list space/separator skipped is minus
sign, then it decides to glue that minus sign to TZH. I think this
behavior is nice to follow, because it allows us to skip spaces after
fields.
2) It appears that Oracle skips spaces not only before fields, but
also in the beginning of the input string.
SELECT to_timestamp(' -2018', ' YYYY') FROM dual
# 01.08.2018 00:00:00
In the example given it seems that first space is skipped, while minus
sign is matched to space.
3) When multiple separators are specified in the format string, Oracle
doesn't allow skipping arbitrary number of spaces before each
separator as we did.
# SELECT to_timestamp('2018- -01 02', 'YYYY--MM-DD') FROM dual
ORA-01843: not a valid month
4) Spaces and separators in format string seem to be handled in the
same way in non-FX mode. But strange things happen when you mix
spaces and separators there.
# SELECT to_timestamp('2018- -01 02', 'YYYY---MM-DD') FROM dual
02.01.2018 00:00:00
# SELECT to_timestamp('2018- -01 02', 'YYYY MM-DD') FROM dual
02.01.2018 00:00:00
# SELECT to_timestamp('2018- -01 02', 'YYYY- -MM-DD') FROM dual
ORA-01843: not a valid month
After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.
# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)
# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)
# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)
1+2+3 are implemented in the attached patch, but not 4. I think that
it's only worth to follow Oracle when it does reasonable things.
I also plan to revise documentation and regression tests in the patch.
But I wanted to share my results so that everybody can give a feedback
if any.
1. https://www.postgresql.org/message-id/CAEepm%3D1Y7z1VSisBKxa6x3E-jP-%2B%3DrWfw25q_PH2nGjqVcX-rw%40mail.gmail.com
2. https://www.postgresql.org/message-id/20180112125848.GA32559%40zakirov.localdomain------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
0001-to-timestamp-format-checking-v14.patch | application/octet-stream | 15.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-08-02 16:03:09 | Re: Ideas for a relcache test mode about missing invalidations |
Previous Message | Nikhil Sontakke | 2018-08-02 15:09:19 | Re: [HACKERS] logical decoding of two-phase transactions |