Re: UUID v7

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Pgsql-Hackers Mailing List <pgsql-hackers(at)postgresql(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Stepan Neretin <sncfmgg(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2025-02-02 10:15:26
Message-ID: E8200F84-C7B0-412D-A3E7-BFD178240610@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 31 Jan 2025, at 23:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for the patch! I agree with the basic direction of this fix.
> Here are some review comments:
>
> ---
> -static inline int64 get_real_time_ns_ascending();
> +static inline uint64 get_real_time_ns_ascending();
>
> IIUC we don't need to replace int64 with uint64 if we have two
> separate parameters for generate_uuidv7(). It seems to be conventional
> to use a signed int for timestamps.

OK, done.

>
> ---
> Need to update the function comment of generate_uuidv7() as we changed
> the function arguments.

Done.

>
> ---
> - ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC)
> - * NS_PER_US + ns % NS_PER_US;
> + us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC);
>
> /* Generate an UUIDv7 */
> - uuid = generate_uuidv7(ns);
> + uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % NS_PER_US);
>
> I think we can have an inline function or a marco (or use TMODULO()?)
> to split nanoseconds into milliseconds and sub-milliseconds so that
> uuidv7() and uuidv7_interval() can pass them to generate_uuidv7().

I doubt that such macro will make core more readable. I've replaced 1000 with macros.

>
> The comments in uuidv7_interval() also need to be updated accordingly.

Done.

>
> ---
> I think we need to consider how we can handle the timestamp shifting.
> UUIDv7 contains 48 bits Unix timestamp at milliseconds precision,
> which can represent timestamps approximately between 2493 BC and 6432
> AC. If users specify an interval to shift the timestamp beyond the
> range, 48-bits timestamp would be wrapped around and they would not be
> able to get an expected result. Do we need to raise an error in that
> case?
>
> ---
> Another problem I found in uuid_extract_timestamp() is that it cannot
> correctly extract a timestamp before 1970/1/1 stored in a UUIDv7
> value:
>
> postgres(1:1795331)=# select year, uuid_extract_timestamp(uuidv7((year
> || 'year ago')::interval)) from generate_series(54, 56) year;
> year | uuid_extract_timestamp
> ------+-----------------------------
> 54 | 1971-01-31 10:46:25.111-08
> 55 | 1970-01-31 10:46:25.111-08
> 56 | 10888-09-01 17:18:15.768-07
> (3 rows)
>
> The problem is that we correctly store a negative timestamp value in a
> UUIDv7 value but uuid_extract_timestamp() unconditionally treats it as
> a positive timestamp value. I think this is a separate bug we need to
> fix.

RFC says unix_ts_ms is unsigned. So, luckily, no BC dates. I bet Pharaohs could not measure nanoseconds.
I think it's totally fine to wrap UUID values around year 10598 without an error.

I was thinking about incorporating test like this.

>> With this patch we can generate correct UUIDs in a very distant future.
>> postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
>> (x::text || ' year'::text)::interval
>> from generate_series(1,9000,1000) x;
>> x | uuid_extract_timestamp | interval
>> ------+-----------------------------+------------
>> 1 | 2026-01-31 12:00:53.084+05 | 1 year
>> 1001 | 3026-01-31 12:00:53.084+05 | 1001 years
>> 2001 | 4026-01-31 12:00:53.084+05 | 2001 years
>> 3001 | 5026-01-31 12:00:53.084+05 | 3001 years
>> 4001 | 6026-01-31 12:00:53.084+05 | 4001 years
>> 5001 | 7026-01-31 12:00:53.085+05 | 5001 years
>> 6001 | 8026-01-31 12:00:53.085+05 | 6001 years
>> 7001 | 9026-01-31 12:00:53.085+05 | 7001 years
>> 8001 | 10026-01-31 12:00:53.085+05 | 8001 years
>> (9 rows)

or maybe something simple like

with u as (select uuidv7() id) select uuid_extract_timestamp(uuidv7('9999-09-09 12:34:56.789+05' - uuid_extract_timestamp(u.id))) from u;

But it would still be flaky, second call to uuidv7() can overflow a millisecond.

Thanks!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v2-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch application/octet-stream 3.8 KB

In response to

  • Re: UUID v7 at 2025-01-31 18:49:48 from Masahiko Sawada

Responses

  • Re: UUID v7 at 2025-02-02 19:40:17 from Sergey Prokhorenko

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Koterov 2025-02-02 11:43:27 Increased work_mem for "logical replication tablesync worker" only?
Previous Message jian he 2025-02-02 09:41:15 Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)