Re: to_timestamp() and timestamp without time zone

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Adrian Klaver'" <adrian(dot)klaver(at)gmail(dot)com>, "'hernan gonzalez'" <hgonzalez(at)gmail(dot)com>
Cc: "'Steve Crawford'" <scrawford(at)pinpointresearch(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: to_timestamp() and timestamp without time zone
Date: 2011-06-23 21:45:56
Message-ID: 012201cc31ee$ef500cc0$cdf02640$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> >
> > Every feature and function in PostgreSQL is "potentially dangerous"
> > - understanding them and using them correctly is the responsibility
> > of the programmer. Time handling has lots of subtleties that take
> > time to digest
> >
> >
> > Thanks for the advice. But it's precisely in the role of a programmer
> > who has digested a good deal about date-time data and its subtleties,
> > and who is trying to use in a consistent an robust way date-time data
> > that I'm asking this question. Or rather, reporting this issue.
> >
> > . It appears that you would like a timestamp of 2011-12-30 00:30:00
> > which you can get. But even so, there are places in the world where
> > that time exists and other places in the world that it does not.
> >
> > If you try to force that timestamp into a zone where it doesn't
> > exist, PostgreSQL makes a reasonable interpretation of the intended
> > point in time.
> >
> >
> > I strongly disagree. I'm not trying "to force that timestamp into a
> > zone" at all. I'm just telling postgresl to parse the string
> > '30/12/2011 00:30:00' as a TIMESTAMP (without time zone), that is, to
> > parse/understand/store it as the abstract/civil (wall calendar+clock)
> > local datetime "30 dec 2011, 00 30 00 am" with NO association with a
> > timezone.

OK, let us try a different approach (summary located at the end for
convenience)

If you ignore TimeZone then you are saying "EVERY (well-formed) TIME STRING
IS VALID". It is only within a TimeZone that specific times can be
considered "Invalid" according to the rules for that TimeZone. Or rather,
in the absence of TimeZone rules all times are valid.

However, to_timestamp(text,text) is a "TimeZone" dependent function and so
you cannot pass in a time that is invalid in the "current" TimeZone. The
alternative to using "to_timestamp(text,text)" is to "SET datestyle = ..."
and using casting; which while valid, is yet another hoop to jump through by
the programmer. Since the to_timestamp(text, text) function exists (which
could also be emulated by using "SET datestyle") apparently the practice of
requiring "SET datestyle" was determined undue hardship and thus the
function was written.

Fine. If you DO pass in a time that is invalid you can either throw an
error or coerce the value into a valid time. Neither decision is best and
it would be nice to give the user the ability to choose the desired
behavior. Also, the option to "warn" instead of error would be nice. But,
given the existing decision to coerce (without warning) instead of throwing
an exception the question comes down to:

Is there a TimeZone in which ALL TIMES ARE VALID? If there is not then any
input into "to_timestamp(text, text)" is potentially invalid and thus at
risk for SILENT COERCION to an invalid value. If that is the case; what is
the proper and fool-proof method to get the "invalid" timestamp back? If
there is you still require the user to both know which TimeZone that is and
to issue a "SET TIMEZONE" prior to any call of "to_timestamp(text, text)" if
they do indeed want to treat all "well-formed time strings" as being valid.

In the example, "December 30, 2007 12:30:00 AM" is not a valid DateTime in
"Buenos_Aires" and so the coercion occurs and "1:30:00 AM" is returned
instead. The same time in 2006 and 2008 ARE valid and so the issue does not
appear if you use those data points. This itself seems strange but as I am
not from Argentina whether this is correct or buggy behavior I cannot say.
Mind you I am running 9.0.4 on Windows 7 64bit.

PostgreSQL currently does not offer a to_timestamp(text, text) like-function
that evaluates/returns a simple "Timestamp" as opposed to a "TimestampTZ" -
using the supplied function always evaluates the input time relative to the
in-session TimeZone and thus could cause the time to be coerced if the input
time is invalid in that TimeZone.

Even if you are afraid to change the behavior of the existing
to_timestamp(text, text) function having TimeZone agnostic functions, that
always evaluate relative to the "Safe" TimeZone, and output a "Timestamp
Without Time Zone" seems reasonable to avoid having people code a custom
function just so they can "SET TIMEZONE = 'UTC'" prior to calling the
existing to_timestamp(text, text) function.

I do think throwing a warning during coercion would be nice so at least
those programs relying on to_timestamp(text, text) will know WHEN it happens
and can react accordingly.

As for "Time handling has lots of subtleties that take time to digest"; a
good programmer and API do their best to minimize the number of hidden
subtleties to be learned. Even if the SQL standard doesn't properly address
the issue doesn't mean the PostgreSQL implementation shouldn't strive to do
things better and make life easier for the programmer. Reading paragraphs
of text to learn how something works (and how to work around its
limitations) is not as good as seeing multiple groups of functions that are
all similar but in which each group provides unique abilities and
restrictions. In this case seeing "to_timestamp_local(text, text)" and
"to_timestamp(text, text)" would make it much more obvious to the user that
special considerations are present when dealing with "timestamptz" (not all
times are valid) compared to "timestamp" (all times are considered valid)
and that the necessary code to implement the specific behavior is hidden
behind each function - whatever that code is. I would maybe even add a
"to_timestamp_strict(text, text)" function which handles timestamptz values
but errors instead of coercing. The "to_timestamp(text, text)" would coerce
but would RAISE NOTICE when it happens so those who care (or who are
oblivious) will know about it. I cannot imagine that so many invalid
timestamptz values are input as to "overflow the logs" even if no-one is
looking - and since it is a runtime data issue there really isn't an
external tool that can look at the static database and evaluate things
"out-of-transaction".

I know exploding the API can be just as bad as having too minimal of one but
just from the fact that the defenders of the status-quo feel that "Time
handling has lots of subtleties..." means that efforts to make it less
subtle are warranted.

Yes, I do sit in the Peanut Gallery but, like the OP, am not a novice when
it comes to programming; and for much of that last paragraph I am liberally
regurgitating from reading/learning.

Workarounds:
1) SET datestyle = 'DMY'; + use "CAST ( '30/12/2007 00:30:00' AS timestamp
)"; at this moment this is the most direct (and possibly only) way to do
this [not positive all formats can be represented however...]; Casting is
fairly smart (I think) so you really just need to get the date ORDER correct
as opposed to explicitly specifying the format. Time should be fairly
straight forward as order does not normally come into play.

2) SET TIMEZONE = 'Some Safe TimeZone Where All Times Are Valid' +
to_timestamp(text, text)::timestamp; if you are going to use "SET" you might
as well just change "datestyle" if your desired format can be represented in
that form

2a) If no "Safe" TimeZone exists then either a "correction algorithm" is
required or the only option is to "SET datestyle" and CAST

3) Consider other "SET" possibilities (lc_time, intervalstyle) for more
precision [not tested]

Solution:
1) Add "to_timestamp_local(text, text) -> timestamp" function which does the
same thing as "to_timestamp(text, text) -> timestamptz" but in a TimeZone
agnostic way

Modifications:
1) Add "to_timestamp_strict(text, text) -> timestamptz" function that errors
instead of coercing
2) Modify "to_timestamp(text, text) -> timestamptz" to emit a warning
whenever a coercion occurs

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-06-23 22:26:48 Re: unique across two tables
Previous Message hernan gonzalez 2011-06-23 20:42:42 Re: to_timestamp() and timestamp without time zone