Re: serial + db key, or guid?

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: Mark Phillips <mark(dot)phillips(at)mophilly(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: serial + db key, or guid?
Date: 2020-08-11 17:01:54
Message-ID: CAFNqd5X70eVqKqUKoXKV6eBLD1FY69wp7L0mHfsN3a4XvdqBrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 11 Aug 2020 at 12:40, Christophe Pettus <xof(at)thebuild(dot)com> wrote:

> > On Aug 11, 2020, at 09:37, Mark Phillips <mark(dot)phillips(at)mophilly(dot)com>
> wrote:
> >
> > I posed the question on the chance things had evolved since 2012,
> specifically as it relates to postgres.
>
> The essentials haven't changed. Keys (such as UUIDs, especially UUID v4)
> that have most of their randomness in the most significant bits can cause
> significant cache hit problems on large indexes. 128 bit keys are usually
> overkill for most applications, unless you need actual *global* uniqueness
> across more than a single database or installation; 64 bit keys are usually
> sufficient.
>

Thus, if performance is highly significant (e.g. - there's lots of data in
the table, and it is heavily read/written) then it may be tempting to use a
sequential value instead because that can be smaller, faster to compare,
and won't trash caches (e.g. - with UUIDs, seemingly adjacent data will be
spread wider across indexes and will need more cache accesses to get to the
data).

If the table is small, or data is infrequently queried/updated, these costs
may be irrelevant.

> UUIDs (and similar very large random keys) do have the advantage that they
> are somewhat self-secure: You can expose them to outsiders without having
> to worry about other keys being guessable.
>

Not overly obvious from this; the "not guessable" part comes in that
chronologically adjacent records won't have any apparent similarity.

With serially assigned transaction IDs, if you, as a user, buy something,
and discover that your transaction ID was 1460795, you might well guess
that other recent purchases were on transactions 1460794, 1460793, and
such, and maybe get at someone else's data by messing with a web URL or
such. Whereas, here's 5 uuids I just generated (dbus-uuidgen isn't
generating RFC 4122 compliant values, but in context of a little
illustration, who cares?)
$ for i in 1 2 3 4 5; do
\ dbus-uuidgen
\ done
0ff745301515c646498cd1165f32cc6e
a9ca459ab6330f24d24af5095f32cc6e
b1cff235d77b1f4d8504920a5f32cc6e
58773af20b34b3c550f4eebf5f32cc6e
f9a13ce961b28751b102c5545f32cc6e

There are some identical low-order bits, but they are pretty well hidden by
the high-order stuff.

It's somewhat cache-destroying, but not especially well guessable.

There is something which has evolved since 2012; see <
https://www.postgresql.org/docs/9.5/uuid-ossp.html>, notably
F.43.2. Building uuid-ossp

It used to be that adding UUID generator support required the OSSP library,
which was sufficiently inconvenient that this would often not be built-in.
Since 9.4 (released in 2014), uuid-ossp can use common built-in libraries
on Linux, OSX, BSD systems, so it's more likely that it will be included
"out of the box" on package-managed deployments of PostgreSQL.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adam Brusselback 2020-08-11 17:42:46 Re: serial + db key, or guid?
Previous Message Christophe Pettus 2020-08-11 16:40:44 Re: serial + db key, or guid?