Re: Guidance on INSERT RETURNING order

From: Kirk Wolak <wolakk(at)gmail(dot)com>
To: Federico <cfederico87(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org, Mike Bayer <mike_mp(at)zzzcomputing(dot)com>
Subject: Re: Guidance on INSERT RETURNING order
Date: 2023-04-12 21:35:20
Message-ID: CACLU5mT5x-76uy_Rh821-WfnD3Z2iuMsZL4y1dsSBK2ypERcmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 11, 2023 at 4:38 PM Federico <cfederico87(at)gmail(dot)com> wrote:

> Thanks for the ansers
>
> > 2) What would you order by, id or data or both?
>
> by values order, (that incidentally seems to be what PG does)
>
> > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> > returning id)
> > select i.id from i order by id;
>
> The problem here is not having the auto increment id in a particular
> order, is that there
> is apparently no correlation with the position of an element in the
> values clause with the
> id generated. That's the reason for using the sentinel column in the
> general solution in the previous message.
>
> The extend on the use case, SQLAlchemy has 3 objects T that have
> T(data='a'), T(data='b'), T(data='c') but no
> value for the id column. The objective is to insert the 3 data values,
> get back the ids and correctly match them with
> the correct 3 objects.
>
> > No. Sadly, adding that ORDER BY is just voodoo programming, because
> > it applies to the result of the SELECT while promising nothing about
> > the order in which INSERT/RETURNING will act on those rows.
>
> I wasn't probably clear, it's fine if INSERT/RETURNING order is
> arbitrary, what matters is that the
> autoincementing values is executed in the same order as select, like
> mentioned in this
> previous message
> https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us
>
> Is that not the case?
>
> > Re-reading that 2012 thread, the main new observation I'd make today
> > is that parallel operation is a thing now, and it's not hard to foresee
> > that sometime soon we'll want to parallelize INSERTs. Which'd make it
> > *really* hard to promise anything about the order of RETURNING output.
>
> I think it's fine not promising anything about the order of RETURNING, but
> it would be very helpful having a way of tracking what input row
> generated a particular
> output row. Basically the sentinel case in the original post,
> without actually having to insert the sentinel into the table.
>
> > I think if you want to use RETURNING with multi-row inserts, the
> > thing to do is more like
> >
> > INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
> >
> > and then explicitly match up the returned "data" values rather than
> > presuming they appear in the same order you wrote them in in VALUES.
> > Admittedly this might be problematic if some of the VALUES rows
> > are identical, but how much should you care?
>
> Well, the example is very easy, but it's hard to generalize when
> inserting multiple columns
> with possible complex values in them, since it would mean matching on
> possibly large json values,
> arrays, etc. So definitely not ideal
>
> Thanks,
> Federico
>
> On Tue, 11 Apr 2023 at 22:06, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
> >
> > On 4/11/23 12:47, Federico wrote:
> > > Hello list,
> > >
> > https://www.sqlite.org/lang_returning.html#limitations_and_caveats
> > >
> > > Searching the archive seems that a using the INSERT SELECT ORDER BY
> > > form should be a better solution,
> > > so the above insert should be rewritten as
> > >
> > > INSERT INTO t(data)
> > > SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
> > > num) ORDER BY num
> > > RETURNING id
> >
> > Or
> >
> > with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> > returning id)
> > select i.id from i order by id;
> >
> > > Sorry for the long email,
> > > Thanks
> > >
> > > Federico
> > >
> > >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com
> >
>
> A couple of comments. For the more generic, I prefer RETURNING *
you get back all the columns for matching. To me, this solves the problem
in a very generic way.
But SQL (and SET THEORY) basically imply you cannot trust the sequencing of
a set of transactions. Parallel execution is just a great simple example.

Secondarily, many frameworks I've worked with (and custom ones developed)
would actually call the SEQUENCE.NEXTVAL, and assign the IDs, in memory,
accepting that we would have gaping holes if some transactions were never
actually sent to the server. We did this a lot in master-detail GUI type
stuff. It's just easier. The children knew their parent ID, and all the
children ID's were effectively known before committing. It made for simple
code that never failed.
(for large datasets we would want one query that returned a set of IDs, we
could order that. And apply it to the records we were about to insert).
[Be Careful with GENERATED ALWAYS pks to OVERRIDE]

HTH

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kirk Wolak 2023-04-12 21:45:19 Re: FW: Error!
Previous Message Rob Sargent 2023-04-12 21:29:34 Re: TEXT column > 1Gb