Re: UUID v1 optimizations...

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Morris de Oryx <morrisdeoryx(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: UUID v1 optimizations...
Date: 2019-05-26 10:37:07
Message-ID: 20190526103707.4una3n34ycqdgmiy@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, May 26, 2019 at 02:27:05PM +1000, Morris de Oryx wrote:
>I'm not worthy to post here, but a bit of a random thought.
>
>If I've followed the conversation correctly, the reason for a V1 UUID is
>partly to order and partition rows by a timestamp value, but without the
>cost of a timestamp column. As I was told as a boy, "Smart numbers aren't."
>Is it _absolutely_ the case that you can't afford another column? I don't
>know the ins and outs of the Postgres row format, but my impression is that
>it's a fixed size, so you may be able to add the column without splitting
>rows? Anyway, even if that's not true and the extra column costs you disk
>space, is it the index that concerns you? Have you considered a timestamp
>column, or a numeric column with an epoch offset, and a BRIN index? If you
>insert data is in pretty much chronological order, that might work well for
>you.
>
>Best of luck, I've enjoyed following the commentary.
>

No, an extra column is not a solution, because it has no impact on the
index on the UUID column. One of the problems with indexes on random
data is that the entries go to random parts of the index. In the extreme
case, each index insert goes to a different index page (since the last
checkpoint) and therefore has to write the whole page into the WAL.
That's what full-page writes do. This inflates the amount of WAL, may
trigger more frequent checkpoints and (of course) reduces the cache hit
ratio for index pages (because we have to touch many of them).

The point of generating UUIDs in a more sequential way is to limit this
behavior by "concentrating" the index inserts into a smaller part of the
index. That's why indexes on sequential data (say, generated from a
SERIAL column) perform better.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Morris de Oryx 2019-05-26 10:38:31 Re: UUID v1 optimizations...
Previous Message Morris de Oryx 2019-05-26 10:24:14 Re: UUID v1 optimizations...