From: | Brendan Jurd <direvus(at)gmail(dot)com> |
---|---|
To: | Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Issue in to_timestamp/to_date while handling the quoted literal string |
Date: | 2019-07-24 12:54:39 |
Message-ID: | CADxJZo2WbU8w+vBHA9fZ7HkfVnwABMwe3N6+t5BwsLATi8vNbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Suraj,
I think the documentation is reasonably clear about this behaviour, quote:
" In to_date, to_number, and to_timestamp, literal text and double-quoted
strings result in skipping the number of characters contained in the
string; for example "XX" skips two input characters (whether or not they
are XX)."
I can appreciate that this isn't the behaviour you intuitively expected
from to_timestamp, and I don't think you'd be the first or the last. The
purpose of these functions was never to validate that your input string
precisely matches the non-coding parts of your format pattern. For that, I
think you'd be better served by using regular expressions.
Just as an aside, in the example you gave, the string '2019-05-24T23:12:45'
will cast directly to timestamp just fine, so it isn't the kind of
situation to_timestamp was really intended for. It's more for when your
input string is in an obscure (or ambiguous) format that is known to you in
advance.
I hope that helps.
Cheers
Brendan
On Wed, 24 Jul 2019 at 21:38, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
wrote:
> Hi,
>
> I noticed the issue in to_timestamp()/to_date() while handling the double
> quote literal string. If any double quote literal characters found in
> format, we generate the NODE_TYPE_CHAR in parse format and store that
> actual character in FormatNode->character. n DCH_from_char, we just
> increment the input string by length of character for NODE_TYPE_CHAR.
> We are actually not matching these characters in input string and because
> of this, date values get changed if quoted literal string is not identical
> in input and format string.
>
> e.g:
>
> postgres(at)78619=#select to_timestamp('2019-05-24T23:12:45',
> 'yyyy-mm-dd"TT"hh24:mi:ss');
> to_timestamp
> ---------------------------
> 2019-05-24 03:12:45+05:30
> (1 row)
>
>
> In above example, the quoted string is 'TT', so it just increment the
> input string by 2 while handling these characters and returned the wrong
> hour value.
>
> My suggestion is to match the exact characters from quoted literal string
> in input string and if doesn't match then throw an error.
>
> Attached is the POC patch which almost works for all scenarios except for
> whitespace - as a quote character.
>
> Suggestions?
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Verite | 2019-07-24 12:59:05 | Re: psql - add SHOW_ALL_RESULTS option |
Previous Message | Jehan-Guillaume de Rorthais | 2019-07-24 12:33:27 | Re: Fetching timeline during recovery |