Re: Review: GiST support for UUIDs

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: GiST support for UUIDs
Date: 2015-09-14 20:31:59
Message-ID: 55F72EBF.70608@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Paul Jungwirth wrote:
>> 2)
>> static double
>> uuid2num(const pg_uuid_t *i)
>> {
>> return *((uint64 *)i);
>> }
>> It isn't looked as correct transformation for me. May be, it's better
>> to transform to numeric type (UUID looks like a 16-digit hexademical
>> number)
>> and follow gbt_numeric_penalty() logic (or even call directly).
>
> Thanks for the review! A UUID is actually not stored as a string of
> hexadecimal digits. (It is normally displayed that way, but with 32
> digits, not 16.) Rather it is stored as an unstructured 128-bit value
> (which in C is 16 unsigned chars). Here is the easy-to-misread
> declaration from src/backend/utils/adt/uuid.c:
Missed number of digit, but nevertheless it doesn't matter for idea.
Original coding uses only 8 bytes from 16 to compute penalty which could
cause a problem with index performance. Simple way is just printing each
4bits with %02d modifier into string and then make a numeric value with
a help of numeric_in.

Or something like this in pseudocode:

numeric = int8_numeric(*(uint64 *)(&i->data[0])) *
int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(&i->data[8]))

> The only other 128-bit type I found in btree_gist was Interval. For that
> type we convert to a double using INTERVAL_TO_SEC, then call
> penalty_num. By my read that accepts a similar loss of precision.
Right, but precision of double is enough to represent 1 century
interval with 0.00001 seconds accuracy which is enough for practical
usage. In UUID case you will take into account only half of value. Of
course, GiST will work even with penalty function returning constant but
each scan could become full-index-scan.

>
> If I'm mistaken about 128-bit integer support, let me know, and maybe we
> can do the penalty computation on the whole UUID. Or maybe I should just
> convert the uint64 to a double before calling penalty_num? I don't
> completely understand what the penalty calculation is all about, so I
> welcome suggestions here.

Penalty method calculates how union key will be enlarged if insert will
be produced in current subtree. It directly affects selectivity of subtree.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-09-14 20:40:14 Re: exposing pg_controldata and pg_config as functions
Previous Message Robert Haas 2015-09-14 20:25:52 Re: Re: [COMMITTERS] pgsql: Check existency of table/schema for -t/-n option (pg_dump/pg_res