Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

From: Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com>
To: Marian Wendt <marian(dot)wendt(at)yahoo(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones
Date: 2023-10-04 15:58:05
Message-ID: CABcidkLmpJ6GrW9ma6mQq1vVEjdoLxtxndfr9E-zJNCKa9bPVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you to all who have weighed in! Very much appreciated.

A few thoughts based on what I've read:

> As far as Lincoln describes it, the series is not the real problem here,
but is just intended to be a simplified example of his actual data.

This both is and isn't the case. I was using gneerate_series to create some
data for testing purposes, but I also would love to be able to use
generate_series for the logic as well.

> SELECT
> sub.gs AS ts_in_utc
> ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York',
> '2023-01-01')
> FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz,
> '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> WHERE
> sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND
> sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz

I believe this query will be funky around DST borders, because `sub.gs AT
TIME ZONE 'America/New_York'` will be localized in a way that erases the
difference between hours with different offsets, which are genuinely
different. For instance, I ran this and there are two rows within it that
look like:

` 2023-11-05 05:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`
and
` 2023-11-05 06:00:00+00 | 2023-11-05 01:00:00 | 2023-11-05 00:00:00`

I think that the non-unique second column will pose an issue for the date
binning at a resolution finer than 1 day.

> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.

Yeah, I'm definitely interested in keeping as much of the DST stuff outside
my code as possible. I guess my concern is that I think there are other
places where this database is being used in my codebase that may rely on
the database time setting being UTC (or really, GMT, though I don't think
there's a difference for these purposes). It would be best if all of my
application's code declared its intentions about the time zone of the
database connection before running its query, but I don't think that's a
feasible change to make right now. That's what's motivating my pursuit of
finding a way to write these queries without changing this setting, through
appropriate casting and such.

> create function generate_series(timestamptz, timestamptz, interval, text)
> returns setof timestamptz
strict immutable language plpgsql as
$$
begin
perform set_config('timezone', $4, true);
return query select generate_series($1, $2, $3);
end
$$ set timezone = 'UTC';

This is a nice suggestion, and in fact, it would be fine from my
perspective to reset to UTC every time. My concern is only around the
safety of the final `set timezone`. Under what circumstances/when can I
count on that being set? E.g. if a query using that function was cancelled
before finishing, would the connection timezone remain as $4? I guess the
3rd parameter to set_config is `is_local` (based on
https://pgpedia.info/s/set_config.html). Does that mean I could run this
outside this context of a function, and expect the setting to go back to
UTC on a rollback? Apologies if these are naive questions.

Thanks again for all the help.

Best,
Lincoln

On Wed, Oct 4, 2023 at 11:09 AM Marian Wendt <marian(dot)wendt(at)yahoo(dot)com> wrote:

>
> Am 04.10.2023 um 16:11 schrieb Tom Lane:
>
> Steve Crawford <scrawford(at)pinpointresearch(dot)com> <scrawford(at)pinpointresearch(dot)com> writes:
>
> On Tue, Oct 3, 2023 at 7:33 PM Lincoln Swaine-Moore <lswainemoore(at)gmail(dot)com> <lswainemoore(at)gmail(dot)com>
> wrote:
>
> 5) Ideally, the solution would not involve messing with the
> server/connection's value of timezone. (Though I would be interested if
> there was a solution that relaxed this constraint and was relatively
> safe/compatible with transactions and psycopg2.)
>
> Note that setting the time zone is a client/connection setting so if you
> set it within a transaction, it will stay set when the transaction
> concludes. But time manipulation is tricky and trying to DIY reinvent the
> wheel is painful and often buggy. Let PostgreSQL do the work for you.
>
> Expanding on that philosophy: you should be able to set the timezone
> locally within a function, so that it wouldn't be that hard to make a
> wrapper for generate_series that emulates the 4-argument version added
> in v16.
>
> Rather than messing with manually saving and restoring the prevailing
> zone, I'd let the function SET infrastructure do it for me. Sadly,
> that SET clause only takes literal constant arguments, so it'd go
> roughly like this:
>
> create function generate_series(timestamptz, timestamptz, interval, text)
> returns setof timestamptz
> strict immutable language plpgsql as
> $$
> begin
> perform set_config('timezone', $4, true);
> return query select generate_series($1, $2, $3);
> end
> $$ set timezone = 'UTC';
>
> Setting the zone to UTC is a useless step, but that triggers
> restoring the previous zone when the function exits; simpler
> and probably faster than coding the save/restore explicitly.
>
> Side note: whether this is really "immutable" is a matter for
> debate, since time zone definitions tend to change over time.
> But we chose to mark the new 4-argument version that way,
> so you might as well do so too.
>
> regards, tom lane
>
>
>
>
> As far as Lincoln describes it, the series is not the real problem here,
> but is just intended to be a simplified example of his actual data. The
> consideration that you can use the time zone using a function should apply
> here... The following SELECT should show as an example that the desired
> result can be achieved (TIMEZONE set to 'Etc/UTC'). Variations of
> date_bin for 15 minutes or 1 hour should work similarly...
>
> SELECT
> sub.gs AS ts_in_utc
> ,sub.gs AT TIME ZONE 'America/New_York' AS ts_in_local_tz
> ,date_bin('1 days', sub.gs AT TIME ZONE 'America/New_York',
> '2023-01-01')
> FROM (SELECT generate_series('2023-11-03 00:00:00Z'::timestamptz,
> '2023-11-07 00:00:00Z'::timestamptz, '5 minutes'::interval) AS gs) AS sub
> WHERE
> sub.gs >= '2023-11-05 00:00:00 America/New_York'::timestamptz AND
> sub.gs < '2023-11-06 00:00:00 America/New_York'::timestamptz
>
> --
> regards, marian wendt
>

--
Lincoln Swaine-Moore

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Johnson, Bruce E - (bjohnson) 2023-10-04 16:04:34 Re: [EXT]Re: Strange error trying to import with Ora2PG
Previous Message Adrian Klaver 2023-10-04 15:38:12 Re: [EXT]Re: Strange error trying to import with Ora2PG