Getting sequence-generated IDs from multiple row insert

From: Ben Hoyt <benhoyt(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Getting sequence-generated IDs from multiple row insert
Date: 2014-03-31 19:28:14
Message-ID: CAL9jXCHVg0CVEke+eiMjYA_UBffnacGYEN9Psb1h-kFw8QLxrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi folks,

I've just run into a subtle but fairly serious race condition while using
web.py's SQL library to insert multiple rows into the database and return
their IDs (a "serial primary key" column). Specifically I'm using the
multiple_insert() function that web.py defines here:

https://github.com/webpy/webpy/blob/master/web/db.py#L793

This function runs a query like this:

INSERT INTO table (col1, col2) VALUES (col1_1, col2_1), (col1_2,
col2_2), ...; SELECT currval('table_id_seq');

Using the output of the currval(), web.py tries to build a list of the most
recent IDs by creating a range from "currval - num_rows_inserted + 1"
through "currval". In Python:

out = range(out-len(values)+1, out+1)

This *looks* nice, and must have seemed fine to the developers who
implemented it, but I've just hit a case where two sessions each doing a
multiple insert don't use sequential IDs. For example, the range code above
for the first insert gave 2117552...2117829. And the second insert gave
2117625...2117818. Which are obviously overlapping and is a nasty bug
waiting to happen. Thankfully it caused an IntegrityError further down in
my code so I didn't screw things up.

First of all, I presume this is expected, and is how the sequence with a
multi-row insert is supposed to work? In other words, the sequence
guarantees the IDs will be unique, but with multi-row insert, they won't
necessarily be consecutive? If so, it's a fairly serious bug in web.py's
multiple_insert(), which probably shouldn't return anything due to this
issue.

Second, what's the right thing to do here? The first thing I found was
PostgreSQL's "RETURNING" clause, but somewhat frustratingly for this use
case, even that's not guaranteed to return the results in the order you
specified. I need the IDs in insertion order so I can do further
processing. Tom Lane and others in this thread indicate that this is not a
guarantee of the RETURNING clause, for future optimization reasons and due
to how SQL handles sets:

http://postgresql.1045698.n5.nabble.com/PATCH-Enforce-that-INSERT-RETURNING-preserves-the-order-of-multi-rows-td5728579.html

So currently I've changed my code to use RETURNING and then I'm ordering
the results based on a secondary column that I know the order of. This
works, but seems clunky, so I'm wondering if there's a nicer way.

Thanks,
Ben

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edson Richter 2014-03-31 19:44:27 Re: Why does "checkpointer" is consumig ~1.2Gb of RAM?
Previous Message Leonardo M. Ramé 2014-03-31 19:22:42 Re: Complex query