From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | IANA timezone abbreviations versus timezone_abbreviations |
Date: | 2024-12-12 23:33:04 |
Message-ID: | 957280.1734046384@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Although we've never documented this, it's been true for ages
that timestamptz_out shows timezone abbreviations that are taken
from the IANA tzdb data (in datestyles that use non-numeric
timezone fields, which is all but ISO). Meanwhile, timestamptz_in
recognizes timezone abbreviations if they match an entry in the
timezone_abbreviations file. Those two sources of truth are not
entirely consistent, leading to fun results like these:
regression=# set datestyle = postgres;
SET
regression=# set timezone = 'America/Montevideo';
SET
regression=# select '1900-01-01 00:00'::timestamptz;
timestamptz
------------------------------
Mon Jan 01 00:00:00 1900 LMT
(1 row)
regression=# select '1900-01-01 00:00'::timestamptz::text::timestamptz;
ERROR: invalid input syntax for type timestamp with time zone: "Mon Jan 01 00:00:00 1900 LMT"
(because LMT for "local mean time" is not in timezone_abbreviations)
regression=# select '1912-01-01 00:00'::timestamptz;
timestamptz
------------------------------
Mon Jan 01 00:00:00 1912 MMT
(1 row)
regression=# select '1912-01-01 00:00'::timestamptz::text::timestamptz;
timestamptz
------------------------------
Sun Dec 31 13:45:09 1911 MMT
(1 row)
(because this zone uses MMT for "Montevideo Mean Time" while
timezone_abbreviations thinks it means "Myanmar Time").
You can get unfortunate results even for current timestamps,
because we've not hesitated to put some North-American-centric
interpretations into the default abbreviations list:
regression=# set timezone = 'Asia/Shanghai';
SET
regression=# select '2024-12-13 00:00'::timestamptz;
timestamptz
------------------------------
Fri Dec 13 00:00:00 2024 CST
(1 row)
regression=# select '2024-12-13 00:00'::timestamptz::text::timestamptz;
timestamptz
------------------------------
Fri Dec 13 14:00:00 2024 CST
(because this IANA zone uses CST for "China Standard Time"
while timezone_abbreviations thinks it means (USA) "Central
Standard Time").
This mess was brought up in pgsql-bugs [1], but the solution
I propose here is invasive enough that I think it needs
discussion on -hackers.
What I think we should do about this is to teach timestamp
input to look into the current IANA time zone to see if it
knows the given abbreviation, and if so use that meaning
regardless of what timezone_abbreviations might say. This
isn't particularly hard, and it doesn't appear to cost
anything speed-wise, but is there anybody out there who
is relying on the current behavior?
I can imagine that somebody might be using an interpretation
that is contrary to IANA's ideas; but it seems fairly unlikely
with current IANA data, because they largely got rid of the
made-up abbreviations their data used to be full of. Anyplace
where we find a non-numeric abbreviation in the IANA data is
probably someplace where that abbreviation is widely current,
and people wouldn't expect it to mean something different.
On the positive side, this gives us a far better story for
abbreviations that conflict in different parts of the world.
timestamptz_in will now automatically do the right thing
given a correct timezone setting, without having to manually
adjust the abbreviation list. So between that and getting
rid of the round-trip hazards seen above, I think there is
sufficient reason to do this.
The only other way I can envision to remove the round-trip hazard
is to stop using alphabetic abbreviations at all in timestamp
output, and use numeric GMT offsets regardless of datestyle.
I doubt that would make many people happy. It would certainly
break a bunch of our own regression tests, and I expect it would
break other people's applications too.
(To be clear, I'm only proposing this for v18 not for back-patch.
While it's certainly fixing live bugs, there have not been that
many complaints, and the behavioral change is surely more than
we want for a back branch.)
Draft patches attached. Any thoughts?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Seek-zone-abbreviations-in-the-IANA-data-before-t.patch | text/x-diff | 16.5 KB |
v1-0002-Improve-DecodeTimezoneAbbrev-s-caching-logic.patch | text/x-diff | 6.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michail Nikolaev | 2024-12-12 23:59:16 | Re: bt_index_parent_check and concurrently build indexes |
Previous Message | Peter Smith | 2024-12-12 23:11:58 | Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility. |