From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Pgsql-Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Stepan Neretin <sncfmgg(at)gmail(dot)com> |
Subject: | Re: UUID v7 |
Date: | 2025-03-26 04:32:49 |
Message-ID: | CAD21AoAVU-VnuqvdafqUCHCAo7Msb6_2k7nSzd_fijMbbtKrCg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 9, 2025 at 9:07 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> I've took into account note from Sergey that "offset" is better name for uuidv7() argument than "shift".
>
> > On 5 Feb 2025, at 03:02, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> >>
> >> I was thinking about incorporating test like this.
> >>
> >>>> With this patch we can generate correct UUIDs in a very distant future.
> >>>> postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
> >>>> (x::text || ' year'::text)::interval
> >>>> from generate_series(1,9000,1000) x;
> >>>> x | uuid_extract_timestamp | interval
> >>>> ------+-----------------------------+------------
> >>>> 1 | 2026-01-31 12:00:53.084+05 | 1 year
> >>>> 1001 | 3026-01-31 12:00:53.084+05 | 1001 years
> >>>> 2001 | 4026-01-31 12:00:53.084+05 | 2001 years
> >>>> 3001 | 5026-01-31 12:00:53.084+05 | 3001 years
> >>>> 4001 | 6026-01-31 12:00:53.084+05 | 4001 years
> >>>> 5001 | 7026-01-31 12:00:53.085+05 | 5001 years
> >>>> 6001 | 8026-01-31 12:00:53.085+05 | 6001 years
> >>>> 7001 | 9026-01-31 12:00:53.085+05 | 7001 years
> >>>> 8001 | 10026-01-31 12:00:53.085+05 | 8001 years
> >>>> (9 rows)
> >>
> >
> > Something like following queries might be workable for example?
> >
> > create table test (c serial, d uuid, t timestamptz generated always as
> > (uuid_extract_timestamp(d)) stored);
> > insert into test (d) select uuidv7((n || 'years')::interval) from
> > generate_series(1, 2000) n;
> > select count(*) from (select t - lag(t) over (order by c) as diff from
> > test) where diff > '10 year' ;
>
> Yeah, makes sense. I reduced tolerance to 366+1 day. Must be stable if we've done all the time offset business right.
>
> > Here are some review comments:
> >
> > #define NS_PER_S INT64CONST(1000000000)
> > #define NS_PER_MS INT64CONST(1000000)
> > +#define US_PER_MS INT64CONST(1000)
> > #define NS_PER_US INT64CONST(1000)
> >
> > I think it's clear if we put US_PER_MS below NS_PER_US.
>
> OK.
>
> > ---
> > *
> > - * ns is a number of nanoseconds since start of the UNIX epoch. This value is
> > + * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
> > + * ns_in_ms is a number of nanoseconds within millisecond. These values are
> > * used for time-dependent bits of UUID.
> >
> > I think we can mention that the RFC describes that stored unix
> > timestamp as an unsigned integer.
>
> Done. Feel free to adjust my wordings, I've no sense of idiomatic English.
>
> >
> > ---
> > static pg_uuid_t *
> > -generate_uuidv7(int64 ns)
> > +generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)
> >
> > How about renaming ns_in_ms with sub_ms?
>
> OK.
>
> >
> > ---
> > + /* 64 bits is enough for real time, but not for a time range of UUID */
> >
> > I could not understand the point of this comment. It seems to say that
> > 64-bits is not enough for a time range of UUID, but doesn't the time
> > range of UUIDv7 use only 48 bits? It seems to need more comments.
>
> I've tried to say that acquiring current time as an int64 ns since UNIX epoch is still viable for the code (until year 2262).
>
>
> > ---
> > - ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> > SECS_PER_DAY * USECS_PER_SEC)
> > - * NS_PER_US + ns % NS_PER_US;
> > + us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> > SECS_PER_DAY * USECS_PER_SEC);
> >
> > /* Generate an UUIDv7 */
> > - uuid = generate_uuidv7(ns);
> > + uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
> > NS_PER_US + ns % NS_PER_US);
> >
> > Need to update comments in uuidv7_internval() such as:
> >
> > /*
> > * Shift the current timestamp by the given interval. To calculate time
> > * shift correctly, we convert the UNIX epoch to TimestampTz and use
> > * timestamptz_pl_interval(). Since this calculation is done with
> > * microsecond precision, we carry nanoseconds from original ns value to
> > * shifted ns value.
> > */
> >
> > and
> >
> > /*
> > * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
> > */
>
> I've tried. I'm not very satisfied with comments, but could not come up with easier description.
>
Thank you for updating the patch. I had missed to track this patch.
I've updated the patch from your v4 patch. In this version, I excluded
the argument name change (from 'shift' to 'offset') as it's not
related to the bug fix and simplified the regression test case.
Please review it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Fix-timestamp-overflow-in-UUIDv7-implementation.patch | application/octet-stream | 6.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-03-26 04:51:11 | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |
Previous Message | vignesh C | 2025-03-26 04:31:40 | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |