Re: Sequence vs UUID

From: veem v <veema0000(at)gmail(dot)com>
To: Benedict Holland <benedict(dot)m(dot)holland(at)gmail(dot)com>
Cc: Miles Elam <miles(dot)elam(at)productops(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Sequence vs UUID
Date: 2023-02-02 19:47:15
Message-ID: CAB+=1TW-+gvMiTdb6fkx0LPE4c6+HPYn-MNWDTTc4sj8pdeyVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tested the UUIDv7 generator for postgres as below.

With regards to performance , It's still way behind the sequence. I was
expecting the insert performance of UUID v7 to be closer to the sequence ,
but it doesn't seem so, as it's 500ms vs 3000ms. And the generation takes a
lot longer time as compared to sequence too i.e. 59ms vs 1700ms. Read time
or the index scan looks close i.e. 2.3ms vs 2.6ms.

*************** UUID7 Alternate way ***********************
create or replace function uuid_generate_v7()
returns uuid
as $$
declare
unix_ts_ms bytea;
uuid_bytes bytea;
begin
unix_ts_ms = substring(int8send(floor(extract(epoch from
clock_timestamp()) * 1000)::bigint) from 3);

-- use random v4 uuid as starting point (which has the same variant we
need)
uuid_bytes = uuid_send(gen_random_uuid());

-- overlay timestamp
uuid_bytes = overlay(uuid_bytes placing unix_ts_ms from 1 for 6);

-- set version 7
uuid_bytes = set_byte(uuid_bytes, 6, (b'0111' || get_byte(uuid_bytes,
6)::bit(4))::bit(8)::int);

return encode(uuid_bytes, 'hex')::uuid;
end
$$
language plpgsql
volatile;

*************** Postgress *****************
CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY);
CREATE UNLOGGED TABLE test_uuid7 ( id uuid PRIMARY KEY);
create sequence myseq cache 32767;

************* sequence generation vs UUID generation Test**************
explain analyze select count(nextval('myseq') ) from
generate_series(1,100000);

QUERY PLAN
Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual
time=59.037..59.039 rows=1 loops=1)
  -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000
width=0) (actual time=18.541..34.200 rows=100000 loops=1)
Planning Time: 0.099 ms
Execution Time: 59.687 ms

explain analyze select count(gen_random_uuid()) from
generate_series(1,100000);

QUERY PLAN
Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual
time=900.633..900.634 rows=1 loops=1)
  -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000
width=0) (actual time=12.893..65.820 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 904.868 ms

explain analyze select count(uuid_generate_v7()) from
generate_series(1,100000);

QUERY PLAN
Aggregate (cost=26250.00..26250.01 rows=1 width=8) (actual
time=1710.609..1710.610 rows=1 loops=1)
  -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000
width=0) (actual time=21.807..69.168 rows=100000 loops=1)
Planning Time: 0.048 ms
Execution Time: 1711.187 ms

*************** Insert Test***************

explain analyze INSERT INTO test_bigint select nextval('myseq') from
generate_series(1,100000);

QUERY PLAN
Insert on test_bigint (cost=0.00..2250.00 rows=0 width=0) (actual
time=551.707..551.708 rows=0 loops=1)
  -> Function Scan on generate_series (cost=0.00..1250.00 rows=100000
width=8) (actual time=12.438..91.337 rows=100000 loops=1)
Planning Time: 0.053 ms
Execution Time: 552.240 ms

explain analyze INSERT INTO test_uuid select gen_random_uuid() from
generate_series(1,100000);

QUERY PLAN
Insert on test_uuid (cost=0.00..2250.00 rows=0 width=0) (actual
time=2040.743..2040.744 rows=0 loops=1)
  -> Function Scan on generate_series (cost=0.00..1250.00 rows=100000
width=16) (actual time=12.829..982.446 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 2041.242 ms

explain analyze INSERT INTO test_uuid7 select uuid_generate_v7() from
generate_series(1,100000);

QUERY PLAN
Insert on test_uuid7 (cost=0.00..27000.00 rows=0 width=0) (actual
time=3234.456..3234.457 rows=0 loops=1)
  -> Function Scan on generate_series (cost=0.00..26000.00 rows=100000
width=16) (actual time=12.453..1933.217 rows=100000 loops=1)
Planning Time: 0.051 ms
Execution Time: 3234.971 ms

************ Index performance********

Explain analyze select * from test_bigint where id in (select id from
test_bigint limit 1000);

EXPLAIN
QUERY PLAN
Nested Loop (cost=27.22..1322.70 rows=1000 width=8) (actual
time=0.414..2.307 rows=1000 loops=1)
  -> HashAggregate (cost=26.92..28.92 rows=200 width=8) (actual
time=0.370..0.520 rows=1000 loops=1)
        Group Key: test_bigint_1.id
        Batches: 1 Memory Usage: 145kB
        -> Limit (cost=0.00..14.42 rows=1000 width=8) (actual
time=0.012..0.163 rows=1000 loops=1)
              -> Seq Scan on test_bigint test_bigint_1
(cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.093
rows=1000 loops=1)
  -> Index Only Scan using test_bigint_pkey on test_bigint
(cost=0.29..6.53 rows=1 width=8) (actual time=0.001..0.001 rows=1
loops=1000)
        Index Cond: (id = test_bigint_1.id)
        Heap Fetches: 1000
Planning Time: 0.395 ms
Execution Time: 2.395 ms

Explain analyze select * from test_uuid where id in (select id from
test_uuid limit 1000);

QUERY PLAN
Nested Loop (cost=28.32..1459.58 rows=1000 width=16) (actual
time=0.431..3.355 rows=1000 loops=1)
  -> HashAggregate (cost=27.91..29.91 rows=200 width=16) (actual
time=0.399..0.556 rows=1000 loops=1)
        Group Key: test_uuid_1.id
        Batches: 1 Memory Usage: 145kB
        -> Limit (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.185 rows=1000 loops=1)
              -> Seq Scan on test_uuid test_uuid_1 (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.093 rows=1000 loops=1)
  -> Index Only Scan using test_uuid_pkey on test_uuid (cost=0.42..7.21
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid_1.id)
        Heap Fetches: 1000
Planning Time: 0.234 ms
Execution Time: 3.419 ms

Explain analyze select * from test_uuid7 where id in (select id from
test_uuid7 limit 1000);

QUERY PLAN
Nested Loop (cost=28.32..1416.01 rows=1000 width=16) (actual
time=0.403..2.586 rows=1000 loops=1)
  -> HashAggregate (cost=27.91..29.91 rows=200 width=16) (actual
time=0.371..0.546 rows=1000 loops=1)
        Group Key: test_uuid7_1.id
        Batches: 1 Memory Usage: 145kB
        -> Limit (cost=0.00..15.41 rows=1000 width=16) (actual
time=0.011..0.161 rows=1000 loops=1)
              -> Seq Scan on test_uuid7 test_uuid7_1 (cost=0.00..1541.85
rows=100085 width=16) (actual time=0.010..0.091 rows=1000 loops=1)
  -> Index Only Scan using test_uuid7_pkey on test_uuid7 (cost=0.42..6.99
rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=1000)
        Index Cond: (id = test_uuid7_1.id)
        Heap Fetches: 1000
Planning Time: 0.101 ms
Execution Time: 2.661 ms

On Thu, 2 Feb 2023 at 20:52, Benedict Holland <benedict(dot)m(dot)holland(at)gmail(dot)com>
wrote:

> Well... until two processes generate an identical UUID. That happened to
> me several times. It's rare but when that happens, oh boy that is a mess to
> figure out.
>
> Thanks,
> Ben
>
> On Thu, Feb 2, 2023, 10:17 AM Miles Elam <miles(dot)elam(at)productops(dot)com>
> wrote:
>
>> On Wed, Feb 1, 2023 at 10:48 AM Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
>>
>>>
>>>
>>> On Wed, Feb 1, 2023 at 1:34 PM veem v <veema0000(at)gmail(dot)com> wrote:
>>>
>>>>
>>>> 1) sequence generation vs UUID generation, execution time increased
>>>> from ~291ms to 5655ms.
>>>> 2) Insert performance of "sequence" vs "UUID" execution time increased
>>>> from ~2031ms to 10599ms.
>>>> 3) Index performance for sequence vs UUID, execution time increased
>>>> from ~.3ms to .5ms.
>>>>
>>>>
>>> Yes, assuming that UUIDs would be efficient as keys when they are
>>> randomly generated, versus sequences (which tend to expand in one
>>> direction, and have been relatively optimized for years).
>>>
>>> This article explains in detail what is going on. If I were doing this,
>>> I would strongly consider a ULID because of getting the best of both worlds.
>>> https://blog.daveallie.com/ulid-primary-keys
>>>
>>> Of course, YMMV... And since ULIDs are not native to PG, there is
>>> overhead, but it is far more performant, IMO...
>>>
>>
>> Biased comparison. ULIDs have a timestamp component. The closest UUID
>> equivalent in Postgres is UUIDv1 from the uuid-ossp extension, not v4.
>> Another difference not mentioned in the blog article is that UUID is
>> versioned, meaning you can figure out what kind of data is in the UUID,
>> whereas ULIDs are a "one size fits all" solution.
>>
>> There is an implementation of sequential UUIDs for Postgres I posted
>> earlier in this thread. In addition, here is an implementation of UUIDv7
>> for Postgres:
>>
>> https://gist.github.com/kjmph/5bd772b2c2df145aa645b837da7eca74
>>
>> I would suggest running your tests against v1, v7, and sequential UUID
>> before jumping on ULID, which has no native type/indexing in Postgres.
>>
>> It should also be noted that apps cannot provide a bigint ID due to
>> collisions, but an app can generate UUIDs and ULIDs without fear,
>> essentially shifting the generation time metric in UUID/ULID's favor over a
>> bigserial.
>>
>> - Miles
>>
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Joe Wildish 2023-02-02 20:11:45 Re: Logical Replication - "invalid ordering of speculative insertion changes"
Previous Message Tom Lane 2023-02-02 15:29:00 Re: From Clause Conditional