From: | "Dawid Kuroczko" <qnex42(at)gmail(dot)com> |
---|---|
To: | "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RFD: hexstring(n) data type |
Date: | 2008-03-03 22:05:04 |
Message-ID: | 758d5e7f0803031405x1301d90fqac6bfac269c3a0fe@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Following the discussion on making UUID data type to be much more liberal
( http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php )
I have decided to try to approach it from more general perspective.
The current state of code is available at:
http://qnex.net/hexstring-0.1-2008-03-03.tgz
And now for more details:
The idea is to have a data type HEXSTRING(n) which can have an optional
typemod specifying the size of data (in bytes).
Internally the datatype is binary compatible with bytea, but I/O is done as
hex-encoded strings. The format is liberal, isspace() and ispunct() characters
are skipped while the digits are read.
I have played with two versions of hexstringin() function, one which uses
strtoul() function and the other which uses "home brew" code. The latter
appears to be faster, so I stayed with that. But I would appreciate
comments on this from more experienced.
So, what are the use cases?
CREATE DOMAIN liberal_uuid AS hexstring(16);
CREATE DOMAIN liberal_macaddr AS hexstring(6);
...it allows for creating other standard hex-types, as for example:
CREATE DOMAIN wwn AS hexstring(8); --
http://en.wikipedia.org/wiki/World_Wide_Name
Also it can be a convenient alternative to bytea format (I know, the
encode()/decode() pair),
especially when you have to format output data as some fancy hex-string.
The code is currently just a bunch of input/output/typemod functions
which appear
to work. I will add casts, operators, etc -- they most likely will be
nicked from bytea.
What I would like to also add is ubiquitous to_char(hex, format) function.
For an UUID-compatilbe format it would be called as:
SELECT to_char(hex, 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx') or
SELECT to_char(hex, '8x-4x-4x-4x-12x') where x is expanded as
[0-9a-f] digit and X is expanded as [0-9A-F].
I am not sure what to do about variable length hexstrings, I am
considering something
like to_char(hex, '8X-') which would produce something like
'00000000-11111111-22222222'
for a 12-byte hexstring (what to do about dangling '-' ?).
...but the original case against liberal UUID was that it would make
the I/O slower.
My simple test:
postgres=# CREATE TABLE uuids AS SELECT uuid_generate_v4()::text AS u
FROM generate_series(1,10000000);
CREATE
postgres=# CREATE TEMP TABLE text ON COMMIT DROP AS SELECT
u::hexstring(16) FROM uuids;
SELECT
Time: 13058.486 ms
postgres=# CREATE TEMP TABLE test ON COMMIT DROP AS SELECT u::uuid FROM uuids;
SELECT
Time: 13536.816 ms
...now hexstring is varlena type but does not use strtoul. Perhaps
uuid might be more liberal too.
What do you think about it?
Regards,
Dawid
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2008-03-03 22:12:51 | Re: A couple of PG schedule reminders |
Previous Message | Joshua D. Drake | 2008-03-03 22:01:47 | Re: A couple of PG schedule reminders |