UUID v1 optimizations...

From: Ancoron Luciferis <ancoron(dot)luciferis(at)googlemail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: UUID v1 optimizations...
Date: 2019-05-25 13:45:53
Message-ID: 8d5f4b67-d540-8d3d-9979-808b9c1e1ca7@googlemail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

Some time ago, I was having trouble with some rather high load OLTP
application (in Java, but that doesn't really matter) that was using v1
UUID's for primary keys and after some time, the bloat of certain
indexes went quite high.

So I investigated the PostgreSQL code to see how it is handling UUID's
with respect to storage, sorting, aso. but all I could find was that it
basically falls back to the 16-byte.

After struggling to find a way to optimize things inside the database, I
reverted to introduce a hack into the application by not shuffling the
timestamp bytes for the UUID's, which makes it look quite serial in
terms of byte order.

With that, we were able to reduce bloat by magnitudes and finally VACUUM
also was able to reclaim index space.

So, my question now is: Would it make sense for you to handle these
time-based UUID's differently internally? Specifically un-shuffling the
timestamp before they are going to storage?

A second question would be whether native support functions could be
introduced? Most interesting for me would be:
* timestamp
* version
* variant

Here are some numbers from the tests I have run (against a PostgreSQL 10
server):

1. insert 5,000,000 rows
2. delete 2,500,000 rows

Index Bloat:
idxname | bloat_ratio | bloat_size | real_size
---------------------+-------------+------------+-----------
uuid_v1_pkey | 23.6 | 46 MB | 195 MB
uuid_serial_pkey | 50.4 | 76 MB | 150 MB

Higher ratio for "serial", but still lower total index size. :)

Now, the performance of VACUUM is also very interesting here:

# vacuum (verbose, analyze, freeze) uuid_serial;
INFO: vacuuming "public.uuid_serial"
INFO: index "uuid_serial_pkey" now contains 2500001 row versions in
19253 pages
DETAIL: 0 index row versions were removed.ce(toast.reltuples, 0) / 4 )
* bs ) as expected_bytes
9624 index pages have been deleted, 9624 are currently reusable.
CPU: user: 0.03 s, system: 0.01 s, elapsed: 0.05 s.t.oid
INFO: "uuid_serial": found 0 removable, 2500001 nonremovable row
versions in 13515 out of 27028 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 270712
There were 94 unused item pointers.
Skipped 0 pages due to buffer pins, 13513 frozen pages.
0 pages are entirely empty.be reused
CPU: user: 0.37 s, system: 0.16 s, elapsed: 2.83 s.
INFO: analyzing "public.uuid_serial"e compressed
INFO: "uuid_serial": scanned 27028 of 27028 pages, containing 2500001
live rows and 0 dead rows; 30000 rows in sample, 2500001 estimated total
rows
VACUUM schemaname, tablename, can_estimate,
Time: 3969.812 ms (00:03.970)

# vacuum (verbose, analyze, freeze) uuid_v1;
INFO: vacuuming "public.uuid_v1"
INFO: scanned index "uuid_v1_pkey" to remove 2499999 row versions
DETAIL: CPU: user: 1.95 s, system: 0.13 s, elapsed: 5.09 s
INFO: "uuid_v1": removed 2499999 row versions in 27028 pages
DETAIL: CPU: user: 0.22 s, system: 0.26 s, elapsed: 3.93 s
INFO: index "uuid_v1_pkey" now contains 2500001 row versions in 24991 pages
DETAIL: 2499999 index row versions were removed.
12111 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "uuid_v1": found 1791266 removable, 2500001 nonremovable row
versions in 27028 out of 27028 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 270716
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 2.90 s, system: 0.71 s, elapsed: 14.54 s.
INFO: analyzing "public.uuid_v1"
INFO: "uuid_v1": scanned 27028 of 27028 pages, containing 2500001 live
rows and 0 dead rows; 30000 rows in sample, 2500001 estimated total rows
VACUUM
Time: 15702.803 ms (00:15.703)

...almost 5x faster!

Now insert another 20 million:

COPY uuid_serial FROM '...' WITH ( FORMAT text );
COPY 20000000
Time: 76249.142 ms (01:16.249)

COPY uuid_v1 FROM '...' WITH ( FORMAT text );
COPY 20000000
Time: 804291.611 ms (13:24.292)

...more than 10x faster!

...and the resulting bloat (no VACUUM in between):
idxname | bloat_ratio | bloat_size | real_size
---------------------+-------------+------------+-----------
uuid_v1_pkey | 30.5 | 295 MB | 966 MB
uuid_serial_pkey | 0.9 | 6056 kB | 677 MB

...still 30% savings in space.

Cheers,

Ancoron

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Peter Eisentraut 2019-05-25 14:19:59 Re: UUID v1 optimizations...
Previous Message Mark Kirkwood 2019-05-25 03:55:52 Re: Use Postgres as a column store by creating one table per column