Re: timestamp and timestamptz

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Niels Jespersen <NJN(at)dst(dot)dk>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: timestamp and timestamptz
Date: 2020-04-16 08:27:57
Message-ID: CABUevExZJ1XhWJxrDbqiNxy_=W-bZrWbgg3k4fFuyews74PjZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Apr 16, 2020 at 6:12 AM Niels Jespersen <NJN(at)dst(dot)dk> wrote:

>
>
>
>
> *Fra:* Magnus Hagander <magnus(at)hagander(dot)net>
> *Sendt:* 15. april 2020 20:05
> *Til:* Niels Jespersen <NJN(at)dst(dot)dk>
> *Cc:* pgsql-general(at)lists(dot)postgresql(dot)org
> *Emne:* Re: timestamp and timestamptz
>
>
>
>
>
>
>
> On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen <NJN(at)dst(dot)dk> wrote:
>
> Hello all
>
>
>
> We have some data that have entered a timestamp column from a csv. The
> data in the csv are in utc. We want to access the data in our native
> timezone (CET).
>
>
>
> I am considering a few alternatives:
>
>
>
> 1. Early in the process, convert to timestamptz and keep this
> datatype.
>
> 2. Early in the process, convert to timestamp as understood in
> CET. This will imply by convention that the data in the timestamp column
> represents CET. Users will need to be told that data represents CET, even
> if data is somwhere in the future kept in another country in another
> timezone.
>
>
>
> I probably should choose 1 over 2. But I am a bit hesitant, probably
> because we almost never have used timestamptz.
>
>
>
> Yes, you should.
>
>
>
>
>
> Can we agree that the below query is selecting both the original utc
> timestamps and 2 and 1 (as decribed above)?
>
>
>
> set timezone to 'cet';
>
> select read_time read_time_utc, (read_time at time zone 'utc')::timestamp
> read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz
> from t limit 10;
>
>
>
>
>
> As long as you use option 1:
>
>
>
> SELECT read_time
>
> will return the time in CET (as a timestamptz) after you've set timezone
> to 'cet'. If you set timezone to 'utc' it will directly return utc.
>
>
>
> SELECT read_time AT TIME ZONE 'utc'
>
> will return the time in UTC (as a timestamp)
>
>
>
>
>
> And just make sure you have done a "set time zone 'utc'" before you *load*
> the data, and everything should just work automatically.
>
>
>
> --
>
> Magnus Hagander
> Me: https://www.hagander.net/ <http://www.hagander.net/>
> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
>
>
>
>
>
> Thank you Magnus (and others) for your replies.
>
>
>
> The raw input data are in this, slightly strange format: 2019.05.01
> 00:00:00. No timezone indicator, just an informal guarantee from the
> supplier that it is indeed utc. And no real chance of changing the format.
> We know, from experience.
>
>
>
> The data volume is a bit high, a few billion rows pr month. So, table
> partitioning is very helpful (aka really indispensable). Data will be
> aggregated in several ways for analytics. Time aggregations must be
> according to our local timezone (cet). We do not want data from one day
> being aggregated into the wrong date because of timezone issues. This means
> that partition boundaries (monthly pratitions most often, sometimes day
> partitions) must be on CET-boundaries so that partition pruning will pull
> data from the relevant cet-month not the utc-month.
>
>
>
> Now, if I load data into a timestamptz with timezone set to utc, partition
> to cet-boundaries, query and aggredate with timezone set to cet, everything
> wil be ok, I think. My small testcase below shows that the row goes into
> the april-partition (as it should). The planner does the correct partition
> pruning according to specified filtering and set timezone. All good.
>
>
>
> create table t (t_id bigserial, ts timestamptz) partition by range (ts);
>
> create table t_2020_02 partition of t for values from ('2020-02-01
> 00:00:00+01') to ('2020-03-01 00:00:00+01');
>
> create table t_2020_03 partition of t for values from ('2020-03-01
> 00:00:00+01') to ('2020-04-01 00:00:00+02');
>
> create table t_2020_04 partition of t for values from ('2020-04-01
> 00:00:00+02') to ('2020-05-01 00:00:00+02');
>
>
>
> set timezone to 'utc';
>
> insert into t (ts) values('2020-03-31 23:30:00');
>
>
>
> Once again, thank you for invaluable feedback.
>
>
>

Yes, this should work just fine. The internal representation of timestamptz
is always UTC, and it's only converted on entry/exit.

You can see this clearly if you create your partitions like above, and then
do a "set timezone to 'America/Los_Angeles'" followed by \d+ t in psql.
This will now show you what the partition bounds are in that timezone.

You can also just specify the timestamps when you create your partition
without including the timezone (+01) specifier. In this PostgreSQL will
interpret it as whatever your current value for the timezone setting is, so
as long as it's CET it should work fine, and you don't have to remember
which months are in DST and which are not.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alex Magnum 2020-04-16 08:35:49 Recursive Queries
Previous Message raf 2020-04-16 08:23:05 Re: timestamp and timestamptz