Re: Guidance on INSERT RETURNING order

From: "Mike Bayer" <mike_mp(at)zzzcomputing(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Federico <cfederico87(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Guidance on INSERT RETURNING order
Date: 2023-04-11 21:07:15
Message-ID: 779f312a-4c10-437e-bfc9-3f4c8eaea18b@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 11, 2023, at 4:22 PM, Tom Lane wrote:
> Federico <cfederico87(at)gmail(dot)com> writes:
> > Searching the archive seems that a using the INSERT SELECT ORDER BY
> > form should be a better solution,
> > so the above insert should be rewritten as
>
> > INSERT INTO t(data)
> > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
> > num) ORDER BY num
> > RETURNING id
>
> > to ensure that the id are created in the order specified by num. The
> > returned id can again be in
> > arbitrary order, but sorting them should enable correctly matching the
> > orm object so that they can
> > be properly updated.
> > Is this correct?
>
> No. Sadly, adding that ORDER BY is just voodoo programming, because
> it applies to the result of the SELECT while promising nothing about
> the order in which INSERT/RETURNING will act on those rows.
>
> Re-reading that 2012 thread, the main new observation I'd make today
> is that parallel operation is a thing now, and it's not hard to foresee
> that sometime soon we'll want to parallelize INSERTs. Which'd make it
> *really* hard to promise anything about the order of RETURNING output.

if I can state this without having RETURNING getting in the way, because we know RETURNING is not ordered.

Start with this table:

CREATE TABLE mytable (
id SERIAL PRIMARY KEY,
a INT,
b INT
)

Then insert two rows, where id SERIAL fires implicitly, assume the next value the sequence will give us is N1, and then the value after that is N2. It doesn't matter what N1 and N2 are (don't need to be consecutive) but we want N2 > N1, that is, increasing.

INSERT INTO mytable (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num

Then SELECT with ORDER BY:

SELECT id, a, b FROM mytable ORDER BY id

We want the results to be:

(N1, 10, 11)
(N2, 12, 13)

and we dont want them to *ever* be:

(N1, 12, 13)
(N2, 10, 11)

that is, we want the SERIAL column (or an IDENTITY also) to be lined up with the VALUES.

From what you wrote in https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us , that seems to be exactly what you've stated, where this statement:

INSERT INTO table (a, b) SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num

is organized by the query planner to essentially be equivalent to this, where the nextval() is part of the SELECTed data:

INSERT INTO mytable (id, a, b) SELECT nextval('mytable_id_seq'), p1, p2 FROM
(SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num) as _x

in practice, we add "RETURNING id" and expect those "id" cols to be in increasing order, so we sort to match it up with the input rows.

>
> I think if you want to use RETURNING with multi-row inserts, the
> thing to do is more like
>
> INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
>
> and then explicitly match up the returned "data" values rather than
> presuming they appear in the same order you wrote them in in VALUES.

we're going to do that also when the table has something like a uuid for a primary key or otherwise.

> Admittedly this might be problematic if some of the VALUES rows
> are identical, but how much should you care?

we only do any of this if the rows have something unique in them we can hook onto.

>
> regards, tom lane
>

thanks so much for replying!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Federico 2023-04-11 21:07:31 Re: Guidance on INSERT RETURNING order
Previous Message Thorsten Glaser 2023-04-11 20:57:26 Re: Guidance on INSERT RETURNING order