Re: Sequence vs UUID

From: Kirk Wolak <wolakk(at)gmail(dot)com>
To: veem v <veema0000(at)gmail(dot)com>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Dominique Devienne <ddevienne(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Sequence vs UUID
Date: 2023-02-01 18:47:44
Message-ID: CACLU5mSVke5W-gY4cvkL_XhgZx26fKkYbyZ_KLaNjxmK=gfEog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Feb 1, 2023 at 1:34 PM veem v <veema0000(at)gmail(dot)com> wrote:

> I tried to test quickly below on dbfiddle, below with the UUID as data
> type and in each of the below cases the UUID performance seems
> drastically reduced as compared to sequence performance. Let me know if
> anything is wrong in my testing here?
>
> 1) sequence generation vs UUID generation, execution time increased from
> ~291ms to 5655ms.
> 2) Insert performance of "sequence" vs "UUID" execution time increased
> from ~2031ms to 10599ms.
> 3) Index performance for sequence vs UUID, execution time increased from
> ~.3ms to .5ms.
>
>
Yes, assuming that UUIDs would be efficient as keys when they are randomly
generated, versus sequences (which tend to expand in one direction, and
have been relatively optimized for years).

This article explains in detail what is going on. If I were doing this, I
would strongly consider a ULID because of getting the best of both worlds.
https://blog.daveallie.com/ulid-primary-keys

Of course, YMMV... And since ULIDs are not native to PG, there is
overhead, but it is far more performant, IMO...

Also, I hold out hope that one of the Gods of PostgreSQL on this list might
give us an internal ULID implementation fixing that last problem...

HTH

>
> PostgreSQL 15.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
> 20210514 (Red Hat 8.5.0-10), 64-bit
>
> CREATE UNLOGGED TABLE test_bigint ( id bigint PRIMARY KEY);
> CREATE UNLOGGED TABLE test_uuid ( id uuid PRIMARY KEY);
> create sequence myseq cache 32767;
>
> ************ sequence generation vs UUID generation Test**************
> explain analyze select count(nextval('myseq') ) from
> generate_series(1,100000);
>
> QUERY PLAN
> Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual
> time=291.030..291.030 rows=1 loops=1)
>   -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000
> width=0) (actual time=53.332..63.941 rows=100000 loops=1)
> Planning Time: 0.155 ms
> Execution Time: 291.719 ms
>
> explain analyze select count(gen_random_uuid()) from
> generate_series(1,100000);
>
> QUERY PLAN
> Aggregate (cost=1500.00..1500.01 rows=1 width=8) (actual
> time=5654.453..5654.454 rows=1 loops=1)
>   -> Function Scan on generate_series (cost=0.00..1000.00 rows=100000
> width=0) (actual time=84.328..514.214 rows=100000 loops=1)
> Planning Time: 0.082 ms
> Execution Time: 5655.158 ms
>
> *************** Insert Test***************
>
> explain analyze INSERT INTO test_bigint select nextval('myseq') from
> generate_series(1,100000);
>
> QUERY PLAN
> Insert on test_bigint (cost=0.00..2250.00 rows=0 width=0) (actual
> time=2030.960..2030.961 rows=0 loops=1)
>   -> Function Scan on generate_series (cost=0.00..1250.00 rows=100000
> width=8) (actual time=48.102..636.311 rows=100000 loops=1)
> Planning Time: 0.065 ms
> Execution Time: 2031.469 ms
>
> explain analyze INSERT INTO test_uuid select gen_random_uuid() from
> generate_series(1,100000);
>
>
> QUERY PLAN
> Insert on test_uuid (cost=0.00..2250.00 rows=0 width=0) (actual
> time=10599.230..10599.230 rows=0 loops=1)
>   -> Function Scan on generate_series (cost=0.00..1250.00 rows=100000
> width=16) (actual time=36.975..6289.811 rows=100000 loops=1)
> Planning Time: 0.056 ms
> Execution Time: 10599.805 ms
>
> ************ Index performance********
>
> Explain analyze select * from test_bigint where id in (select id from
> test_bigint limit 100);
>
> QUERY PLAN
> Nested Loop (cost=2.98..734.71 rows=100 width=8) (actual
> time=0.083..0.269 rows=100 loops=1)
>   -> HashAggregate (cost=2.69..3.69 rows=100 width=8) (actual
> time=0.046..0.062 rows=100 loops=1)
>         Group Key: test_bigint_1.id
>         Batches: 1 Memory Usage: 24kB
>         -> Limit (cost=0.00..1.44 rows=100 width=8) (actual
> time=0.011..0.025 rows=100 loops=1)
>               -> Seq Scan on test_bigint test_bigint_1
> (cost=0.00..1444.18 rows=100118 width=8) (actual time=0.011..0.017
> rows=100 loops=1)
>   -> Index Only Scan using test_bigint_pkey on test_bigint
> (cost=0.29..7.31 rows=1 width=8) (actual time=0.002..0.002 rows=1
> loops=100)
>         Index Cond: (id = test_bigint_1.id)
>         Heap Fetches: 100
> Planning Time: 0.279 ms
> Execution Time: 0.302 ms
>
> Explain analyze select * from test_uuid where id in (select id from
> test_uuid limit 100);
>
> QUERY PLAN
> Nested Loop (cost=3.21..783.31 rows=100 width=16) (actual
> time=0.080..0.474 rows=100 loops=1)
>   -> HashAggregate (cost=2.79..3.79 rows=100 width=16) (actual
> time=0.046..0.066 rows=100 loops=1)
>         Group Key: test_uuid_1.id
>         Batches: 1 Memory Usage: 24kB
>         -> Limit (cost=0.00..1.54 rows=100 width=16) (actual
> time=0.010..0.025 rows=100 loops=1)
>               -> Seq Scan on test_uuid test_uuid_1 (cost=0.00..1541.85
> rows=100085 width=16) (actual time=0.009..0.016 rows=100 loops=1)
>   -> Index Only Scan using test_uuid_pkey on test_uuid (cost=0.42..7.79
> rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=100)
>         Index Cond: (id = test_uuid_1.id)
>         Heap Fetches: 100
> Planning Time: 0.180 ms
> Execution Time: 0.510 ms
>
> On Tue, 31 Jan 2023 at 03:28, Ron <ronljohnsonjr(at)gmail(dot)com> wrote:
>
>>
>> And populate that column with UUIDs generated by the gen_random_uuid()
>> function.
>>
>> (Requires v13.)
>>
>> On 1/30/23 13:46, Adrian Klaver wrote:
>> > On 1/30/23 11:43, veem v wrote:
>> >> Thank You So much for the details. I am a bit new to postgres. And
>> these
>> >> test results I picked were from a dev system. If I understand it
>> >> correctly, do you mean these settings(usage of C locale or "native"
>> >> 16-byte uuid) which you mentioned should be there in a production
>> system
>> >> and thus we should test the performance of the UUID vs sequence on a
>> >> similar setup? Or say if this sort of degradation of UUID performance
>> is
>> >> not expected then , how to get these settings tweaked ON, so as to see
>> >> the best string type or UUID performance, can you please guide me here?
>> >
>> > No what is being said is change:
>> >
>> > source_id varchar(36)
>> >
>> > to
>> >
>> > source_id uuid
>> >
>> > as i:
>> >
>> > https://www.postgresql.org/docs/current/datatype-uuid.html
>> >
>> >>
>> >> On Mon, 30 Jan 2023 at 22:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
>> >> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>> >>
>> >> Dominique Devienne <ddevienne(at)gmail(dot)com
>> >> <mailto:ddevienne(at)gmail(dot)com>> writes:
>> >> > On Mon, Jan 30, 2023 at 5:11 PM veem v <veema0000(at)gmail(dot)com
>> >> <mailto:veema0000(at)gmail(dot)com>> wrote:
>> >> >> CREATE TABLE test1_UUID ( id bigint,source_id varchar(36)
>> >> PRIMARY KEY, Name varchar(20) );
>> >>
>> >> > Maybe if you used a "native" 16-byte uuid, instead of its
>> textual
>> >> > serialization with dashes (36 bytes + length overhead), the gap
>> would
>> >> > narrow.
>> >>
>> >> Yeah, especially if your database is not using C locale. The
>> >> strcoll or ICU-based comparisons done on string types can be
>> >> enormously more expensive than the memcmp() used for binary
>> >> types like native uuid.
>> >>
>> >> regards, tom lane
>> >>
>> >
>>
>> --
>> Born in Arizona, moved to Babylonia.
>>
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2023-02-01 18:53:19 Re: Best Open Source OS for Postgresql
Previous Message Christophe Pettus 2023-02-01 18:47:17 Re: VACUUM vs VACUUM FULL (was: SELECT * FROM huge_table LIMIT 10; Why does it take more than 10 min to complete, with cold caches)