Re: Sequence vs UUID

From: veem v <veema0000(at)gmail(dot)com>
To: Miles Elam <miles(dot)elam(at)productops(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Sequence vs UUID
Date: 2023-01-30 16:11:34
Message-ID: CAB+=1TV9gpAA-cGt_AShCmMA9fiu9jAa4egZF+oMELApOz-JYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Was trying to test the performance for simple read/write for the bigint vs
UUID. What we see is , ~3 times performance degradation while joining on
bigint vs UUID columns. Also even just generation of sequence vs bigint
itself is degrading by ~3times too. Also even insert performance on same
table for ~10million rows is ~1min 39sec for bigint vs ~3minute 11 sec in
case of UUID. Is such extent of degradation in performance this expected
for UUID?

CREATE TABLE test1_UUID ( id bigint,source_id varchar(36) PRIMARY KEY, Name
varchar(20) );
CREATE TABLE test2_UUID (id bigint,source_id varchar(36) PRIMARY KEY,Name
varchar(20) );

CREATE TABLE test1_bigint ( id bigint PRIMARY KEY, source_id varchar(36) ,
Name varchar(20));
CREATE TABLE test2_bigint ( id bigint PRIMARY KEY, source_id varchar(36) ,
Name varchar(20));

Loaded same 10million rows.

explain Analyze select * from test1_bigint a , test2_bigint b where a.id =
b.id

Merge Join (cost=12.31..875534.52 rows=10000021 width=100) (actual
time=0.042..6974.575 rows=10000000 loops=1)
Merge Cond: (a.id = b.id)
-> Index Scan using test1_bigint_pkey on test1_bigint a
(cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.020..2070.079
rows=10000000 loops=1)
-> Index Scan using test2_bigint_2_pkey on test2_bigint b
(cost=0.43..362780.75 rows=10000021 width=50) (actual time=0.019..2131.086
rows=10000000 loops=1)
Planning Time: 0.207 ms
Execution Time: 7311.210 ms

set enable_seqscan=off;

explain Analyze select * from test1_UUID a , test2_UUID b where a.source_id
= b.source_id;

Merge Join (cost=2.75..2022857.05 rows=10000021 width=100) (actual
time=0.043..21954.213 rows=10000000 loops=1)
Merge Cond: ((a.source_id)::text = (b.source_id)::text)
-> Index Scan using test1_uuid_pkey on test1_UUID a (cost=0.56..936420.18
rows=10000021 width=50) (actual time=0.022..7854.143 rows=10000000 loops=1)
-> Index Scan using test2_uuid_2_pkey on test2_UUID b
(cost=0.56..936437.90 rows=10000021 width=50) (actual time=0.017..7971.187
rows=10000000 loops=1)
Planning Time: 0.516 ms
Execution Time: 22292.801 ms

**********

create sequence myseq cache 32767;

select count(nextval('myseq') ) from generate_series(1,10000000)
1 row retrieved starting from 1 in 4 s 521 ms (execution: 4 s 502 ms,
fetching: 19 ms)

select count(gen_random_uuid()) from generate_series(1,10000000)
1 row retrieved starting from 1 in 11 s 145 ms (execution: 11 s 128 ms,
fetching: 17 ms)

On Mon, 30 Jan, 2023, 4:59 pm veem v, <veema0000(at)gmail(dot)com> wrote:

> I have a question, As i understand here, usage wise there are multiple
> benefits of UUID over sequences like, in case of distributed app where we
> may not be able to rely on one point generator like sequences, in case of
> multi master architecture, sharding.
>
> If we just look in terms of performance wise, the key advantage of
> sequence is that for read queries, because of the storage size it will be
> smaller and thus it will cache more index rows and so will be beneficial
> during read queries and should also be beneficial even on joins because of
> its smaller size. Also fetching a value from sequence is cheaper than
> calculating the UUIDS. But the downside is during write operation, it can
> be a point of contention in case of concurrent data load as every incoming
> request will try to modify same table/index page/block. But as its
> mentioned in this blog (
> https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/) state
> this UUID can be made sequential so even this can be sequential using
> prefix-timestamp etc. However isn't it that making the UUID sequential will
> again actually be a disadvantage and can be contention point for this
> unique index as each incoming write will now fight for same block/page
> while doing concurrent data load and will contend for the same table block
> or say one side of the index branch/leaf block etc, whereas in case of
> random UUIDs the write was spreading across multiple blocks so there was no
> contention on any specific blocks? Please correct if my understanding is
> wrong?
>
>
>
> On Sun, 29 Jan, 2023, 10:33 am Miles Elam, <miles(dot)elam(at)productops(dot)com>
> wrote:
>
>> On Sat, Jan 28, 2023 at 8:02 PM Ron <ronljohnsonjr(at)gmail(dot)com> wrote:
>> >
>> > Then it's not a Type 4 UUID, which is perfectly fine; just not random.
>>
>> Yep, which is why it really should be re-versioned to UUIDv8 to be
>> pedantic. In everyday use though, almost certainly doesn't matter.
>>
>> > Also, should now() be replaced by clock_timestamp(), so that it can be
>> > called multiple times in the same transaction?
>>
>> Not necessary. Instead of 122 bits of entropy, you get 106 bits of
>> entropy and a new incremented prefix every minute. now() vs
>> clock_timestamp() wouldn't make a substantive difference. Should still
>> be reasonably safe against the birthday paradox for more than a
>> century when creating more than a million UUIDs per second.
>>
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dominique Devienne 2023-01-30 16:37:42 Re: Sequence vs UUID
Previous Message jacktby@gmail.com 2023-01-30 15:23:53 How could I elog the tupleTableSlot to the fronted terminal?