Re: UUID v7

From: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-04-06 21:59:38
Message-ID: 305478845.5279532.1712440778735@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

For every complex problem there is an answer that is clear, simple, and wrong. Since the RFC allows microsecond timestamp granularity, the first thing that comes to everyone's mind is to insert microsecond granularity into UUIDv7. And if the RFC allowed nanosecond timestamp granularity, then they would try to insert nanosecond granularity into UUIDv7.
But I am categorically against abandoning the counter under pressure from the unfounded proposal to replace the counter with microsecond granularity.
1) The RFC specifies millisecond timestamp granularity by default.
2) All advanced UUIDv7 implementations include a counter:• for JavaScript https://www.npmjs.com/package/uuidv7• for Rust https://crates.io/crates/uuid7• for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7• for Python https://github.com/oittaa/uuid6-python
3) The theoretical performance of generating UUIDv7 without loss of monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. This is very low and insufficient generation performance! But the actual generation performance is even worse, since the generation demand is unevenly distributed within a millisecond. Therefore, a UUIDv7 will not be generated every microsecond.
For a counter 18 bits long, with the most significant bit initialized to zero and the remaining bits initialized to a random number, the actual performance of generating a UUIDv7 without loss of monotonicity is between 2 to the power of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random number happens to be all zeros). This is more than enough.
4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.

The only reasonable use of microsecond granularity is when writing to a database table in parallel. However, monotonicity in this case can be ensured in another way, namely a single UUIDv7 generator per database table, similar to SERIAL (https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) in PostgreSQL.
Best regards,
Sergey Prokhorenkosergeyprokhorenko(at)yahoo(dot)com(dot)au

On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:


...

At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness. It's still a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures later.

Best regards, Andrey Borodin.

In response to

  • Re: UUID v7 at 2024-04-04 18:12:10 from Andrey M. Borodin

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2024-04-06 22:00:25 Re: Fixing backslash dot for COPY FROM...CSV
Previous Message Alexander Korotkov 2024-04-06 21:52:47 Re: [HACKERS] make async slave to wait for lsn to be replayed