Re: UUID v7

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Stepan Neretin <sncfmgg(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-11-07 07:42:05
Message-ID: CAD21AoBVxi5hZJnoyN-PKhd5UxCFtooRiDpsTDs3Eg=rROgsBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
>
>
> > On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> >
> > <v30-0001-Implement-UUID-v7.patch>
>
> Some more thoughts on this patch version:
>
> 0. Comment mentioning nanoseconds, while we do not need to carry anything
> /* Convert TimestampTz back and carry nanoseconds. */
>
> 1. There's unnecessary &3 in
> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>
> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.
>

While using only 10 bits microseconds makes the implementation simple,
I'm not sure if 10 bits is enough to generate UUIDs at microsecond
granularity without losing monotonicity. Since 10-bit microseconds are
used as is in rand_a space, 1000 UUIDs can be generated per
millisecond without losing monotonicity.

For example, in my environment, it took 1808 milliseconds to generate
1 million UUIDs. This is about 533 UUIDs generated per millisecond. As
UUID generation performance improves, I think 10 bits will not be
enough.

=# select count(uuidv7()) from generate_series(1, 1_000_000);
count
---------
1000000
(1 row)

Time: 1808.734 ms

I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:

> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.

I feel that it's better to switch to Method 1 or 2 with 12 bits or
larger counter space.

Regards,

[1] https://www.postgresql.org/message-id/305478845.5279532.1712440778735%40mail.yahoo.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

  • Re: UUID v7 at 2024-11-06 18:14:29 from Andrey M. Borodin

Responses

  • Re: UUID v7 at 2024-11-07 08:34:14 from Andrey M. Borodin

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-11-07 08:05:01 Re: Deleting older versions in unique indexes to avoid page splits
Previous Message jian he 2024-11-07 07:10:33 Re: not null constraints, again