From: | "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Stepan Neretin <sncfmgg(at)gmail(dot)com> |
Subject: | Re: UUID v7 |
Date: | 2024-11-07 08:34:14 |
Message-ID: | 672CA2F0-9277-4EFE-808D-4B7D97937739@yandex-team.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>>
>>
>>> On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>>
>>> <v30-0001-Implement-UUID-v7.patch>
>>
>> Some more thoughts on this patch version:
>>
>> 0. Comment mentioning nanoseconds, while we do not need to carry anything
>> /* Convert TimestampTz back and carry nanoseconds. */
>>
>> 1. There's unnecessary &3 in
>> uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
>>
>> 2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.
>>
>
> While using only 10 bits microseconds makes the implementation simple,
> I'm not sure if 10 bits is enough to generate UUIDs at microsecond
> granularity without losing monotonicity. Since 10-bit microseconds are
> used as is in rand_a space, 1000 UUIDs can be generated per
> millisecond without losing monotonicity.
We won’t loose monotonicity on one backend. We will just accumulate time shift.
See
+ us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+ if (previous_us >= us)
+ us = previous_us + 1;
>
> For example, in my environment, it took 1808 milliseconds to generate
> 1 million UUIDs. This is about 533 UUIDs generated per millisecond.
BTW we can furether improve this performance by buffering CSPRNG. See v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch in this thread.
> As
> UUID generation performance improves, I think 10 bits will not be
> enough.
>
> =# select count(uuidv7()) from generate_series(1, 1_000_000);
> count
> ---------
> 1000000
> (1 row)
>
> Time: 1808.734 ms
>
> I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:
>
>> 4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.
>
> I feel that it's better to switch to Method 1 or 2 with 12 bits or
> larger counter space.
>
12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter. See version v24-0001-Implement-UUID-v7.patch
This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.
Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.
Best regards, Andrey Borodin.
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-11-07 08:41:34 | Re: define pg_structiszero(addr, s, r) |
Previous Message | Michael Paquier | 2024-11-07 08:16:51 | Re: pg_stat_statements: Avoid holding excessive lock |