From: | Peter Crabtree <peter(dot)crabtree(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Generating Lots of PKs with nextval(): A Feature Proposal |
Date: | 2010-05-14 18:56:18 |
Message-ID: | AANLkTikJhir38iN0aqobpLKYDYl07Rss08mP1orgKpBZ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Recently, in preparation for migrating an application to postgres, I
got to this part of the manual (which is *excellent* so far, by the
way):
http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html
A quick check with the folks on #postgresql confirmed my
understanding, which was that the locking semantics of setval() and
nextval() make this unsafe:
SELECT setval('my_seq', nextval('my_seq') + 500);
Now, I was reminded that I could simply do this:
SELECT nextval('my_seq') FROM generate_series(1, 500);
But of course then I would have no guarantee that I would get a
contiguous block of ids, which means if I'm using this to do a mass
insert of records which refer to each others' ids (example: storing a
directed, linear graph), I either have to do a correlated update on
the client side, after transferring the keys (consider the cost of
doing this for a few million records - 4 MB in keys per million
records, for, in extreme cases, 12 MB of data to be inserted -- 33%
overhead in the worst case, presuming symmetric bandwidth), or I have
to insert into a temporary table, then have the db backend do the
update, then insert from there to the real table. Both are imperfect
options in terms of performance and complexity.
Thus, before I start work on it, I propose an extension to the current
nextval():
SELECT nextval('my_seq', 500);
This would increment the my_seq sequence by its interval * 500, and
return the first valid key. This both makes client code that needs a
bunch of PKs simpler to implement, and saves in performance, since the
client can just replace all its PKs (presuming they're currently a
contiguous block from 1 to n) with my_starting_pk + current_pk, so
this:
pk | next_node
----+-----------
0 | 1
1 | 2
2 | 0
can be easily updated like this:
SELECT nextval('my_seq', (SELECT count(*) FROM my_table));
UPDATE my_table SET pk = currval('my_seq') + pk, next_node =
currval('my_seq') + next_node;
to something like this:
pk | next_node
--------+----------
521650 | 521651
521651 | 521652
521652 | 521650
This is a net gain of performance and ease of implementation in many
cases where a large number of ids from a sequence are needed -- with a
small added benefit of the keys being guaranteed to be contiguous.
I don't see any technical problems with this; postgres already can
"pre-allocate" more than one key, but the number is semi-static (the
CACHE parameter to CREATE SEQUENCE). This might break existing user
code if they've defined a nextval(regclass, integer), but I don't see
any way to
Finally, I've checked sequence.c -- this looks pretty straightforward
to implement, but I figured checking with this list was wise before
starting work. Apologies if I've been overly wordy.
Peter
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2010-05-14 18:56:55 | Re: fillfactor gets set to zero for toast tables |
Previous Message | Robert Haas | 2010-05-14 18:48:27 | Re: Generalized Inverted Generalized Search Tree |