Re: UUID v7

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Nick Babadzhanian <pgnickb(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, "Kyzer Davis (kydavis)" <kydavis(at)cisco(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "brad(at)peabody(dot)io" <brad(at)peabody(dot)io>, Kirk Wolak <wolakk(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-01-29 13:58:55
Message-ID: CAEG8a3+jKVT=wRbeT=dC25Tm_NyXZ-XQTPrDoOnJnGj12A5K4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
>
> tl;dr I believe we should remove the uuidv7(timestamp) function from
> this patchset.
>
> On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
> <sergeyprokhorenko(at)yahoo(dot)com(dot)au> wrote:
> > In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.
> >
> > The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.
>
> After re-reading the RFC more diligently, I'm inclined to agree with
> Sergey that uuidv7(timestamp) is quite problematic. And I would even
> say that we should not provide uuidv7(timestamp) at all, and instead
> should only provide uuidv7(). Providing an explicit timestamp for
> UUIDv7 is explicitly against the spec (in my reading):
>
> > Implementations acquire the current timestamp from a reliable
> > source to provide values that are time-ordered and continually
> > increasing. Care must be taken to ensure that timestamp changes
> > from the environment or operating system are handled in a way that
> > is consistent with implementation requirements. For example, if
> > it is possible for the system clock to move backward due to either
> > manual adjustment or corrections from a time synchronization
> > protocol, implementations need to determine how to handle such
> > cases. (See Altering, Fuzzing, or Smearing below.)
> >
> > ...
> >
> > UUID version 1 and 6 both utilize a Gregorian epoch timestamp
> > while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp
> > sources or a custom timestamp epoch are required, UUIDv8 MUST be
> > used.
> >
> > ...
> >
> > Monotonicity (each subsequent value being greater than the last) is
> > the backbone of time-based sortable UUIDs.
>
> By allowing users to provide a timestamp we're not using a continually
> increasing timestamp for our UUIDv7 generation, and thus it would not
> be a valid UUIDv7 implementation.
>
> I do agree with others however, that being able to pass in an
> arbitrary timestamp for UUID generation would be very useful. For
> example to be able to partition by the timestamp in the UUID and then
> being able to later load data for an older timestamp and have it be
> added to to the older partition. But it's possible to do that while
> still following the spec, by using a UUIDv8 instead of UUIDv7. So for
> this usecase we could make a helper function that generates a UUIDv8
> using the same format as a UUIDv7, but allows storing arbitrary
> timestamps. You might say, why not sligthly change UUIDv7 then? Well
> mainly because of this critical sentence in the RFC:
>
> > UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.
>
> That would allow us to say that using this UUIDv8 helper requires
> careful usage and checks if uniqueness is required.
>
> So I believe we should remove the uuidv7(timestamp) function from this patchset.

Agreed, the RFC section 6.1[1] has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.

[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

>
> I don't see a problem with including uuid_extract_time though. Afaict
> the only thing the RFC says about extracting timestamps is that the
> RFC does not give a requirement or guarantee about how close the
> stored timestamp is to the actual time:
>
> > Implementations MAY alter the actual timestamp. Some examples
> > include security considerations around providing a real clock
> > value within a UUID, to correct inaccurate clocks, to handle leap
> > seconds, or instead of dividing a number of microseconds by 1000
> > to obtain a millisecond value; dividing by 1024 (or some other
> > value) for performance reasons. This specification makes no
> > requirement or guarantee about how close the clock value needs to
> > be to the actual time.
>
> I see no reason why we cannot make stronger guarantees about the
> timestamps that we use to generate UUIDs with our uuidv7() function.
> And then we can update the documentation for
> uuid_extract_time to something like this:
>
> > This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
> > versions and variants this function returns NULL. The extracted timestamp
> > does not necessarily equate to the time of UUID generation. How close it is
> > to the actual time depends on the implementation that generated to UUID.
> > The uuidv7() function provided PostgreSQL will normally store the actual time of
> > generation to in the UUID, but if large batches of UUIDs are generated at the
> > same time it's possible that some UUIDs will store a time that is slightly later
> > than their actual generation time.
>
>

--
Regards
Junwang Zhao

In response to

  • Re: UUID v7 at 2024-01-29 11:38:24 from Jelte Fennema-Nio

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2024-01-29 14:03:19 Re: Use of backup_label not noted in log
Previous Message Zhijie Hou (Fujitsu) 2024-01-29 13:47:35 RE: Synchronizing slots from primary to standby