Re: Sequence vs UUID

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Rob Sargent <robjsargent(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Sequence vs UUID
Date: 2023-01-26 22:42:57
Message-ID: CAHyXU0zJ3dA+WOu7NOk0jbVXcYycc43589NgZBfxK3LNb-_McQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Jan 26, 2023 at 3:50 PM Rob Sargent <robjsargent(at)gmail(dot)com> wrote:

> On 1/26/23 14:36, Merlin Moncure wrote:
>
> On Thu, Jan 26, 2023 at 1:18 PM veem v <veema0000(at)gmail(dot)com> wrote:
>
>> Hello, We were trying to understand whether we should use UUID or
>> Sequence in general for primary keys. In many of the blogs(one is below)
>> across multiple databases, I saw over the internet and all are mostly
>> stating the sequence is better as compared to UUID. But I think in the case
>> of concurrent data load scenarios UUID will spread the contention point
>> whereas sequence can be a single point of contention.
>>
>> So we want to understand from experts here, if there are any clear rules
>> available or if we have any pros vs cons list available for each of
>> those to understand the exact scenario in which we should go for one over
>> other? Basically I wanted to see if we can perform some test on sample
>> data to see the percentage of overhead on read and write performances of
>> the query in presence of UUID VS Sequence to draw some conclusion in
>> general? And also considering open source postgres as the base for many
>> databases like redshift etc, so the results which apply to
>> progress would apply to others as well.
>>
>>
>> https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/
>>
> * UUIDs are big for identifer (16 bytes) sequence is 4 or 8
> * sequences are NOT a contention point, database uses some tricks to work
> around that
> * UUIDS are basically random data causing page fragmentation. this is
> particularly bad in auto clustering architectures like sql server
> * Also, UUIDS can deliver very poor buffer hit ratios when sweeping ranges
> of records on large tables.
>
>
> "Sweeping ranges of records" by ID suggests you have information in your
> id. If you're comfortable with sequence generation as a surrogate for
> time-point of entry, by all means have at it.
>

Yeah, sequences preserve time locality. Clustering the table on the
primary key would then optimize certain cases. Clustering is useless on
guids. Sequences do better here and hybrid natural key models can do
better still. Buffer hit ratio and read amplification can be major issues
if not understood and controlled for.

> For many of us, contemporaneously generated records have nothing to do
> with each other. (And they carry attributes which groups them.)
>
> UUIDs do a very good job of avoiding id-overlap across domains (compare to
> a sequence for each domain).
>

While true, there are other solutions to the issue, one or more of,
* separating id generation from storage
* reserving id ranges
* creating domain id composite alongside generated id

Relying solely on guid to generate uniqueness seems like a broken model
anyways in the general case. Whatever thing is responsible for
suppressing information conflicts ought to be emitting the id.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2023-01-27 00:48:50 Re: Sequence vs UUID
Previous Message Benedict Holland 2023-01-26 22:00:04 Re: Sequence vs UUID