Re: Guidance on INSERT RETURNING order

From: John Howroyd <jdhowroyd(at)googlemail(dot)com>
To: Mike Bayer <mike_mp(at)zzzcomputing(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Guidance on INSERT RETURNING order
Date: 2023-04-13 03:02:40
Message-ID: CAAGaYBxXcjUewV2kH=fjZCRoXSCDZPsvkSGz10LCvnW71wCj5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The ideal solution would be that the RETURNING values are in the same order
as the declared rows in the table literal "FROM (VALUES (10, 11, 1), (12,
13, 2)) AS sel(p1, p2, num)". But before any of you jump down my throat
about about sets and a bar to parallel inserts, I think it's a small matter
of some internal smoke and mirrors: simply marshal the RETURNING results
through and in memory/temporary table maintaining the order from the SELECT
and return these to the client upon commit. At this moment, I believe the
folk at SQLAlchemy are looking for a work around that can work for most use
cases with either an identity column or a fake "sentinel" column for
matching back to the ORM objects. There may be an upper limit of 1000 to
the number rows in a single insert (this arises in MS SQL Server, who
absolutely do not guarantee the order of their OUTPUT clause -- it's tied
to the physical row insert rather than marshalled).

My feeling is that PostgreSQL may already do this. See the commit:
https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63
and the original message thread
https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com
and references to undesired behaviour prior to PostgreSQL 9.6 such as in
https://www.postgresql.org/docs/current/sql-select.html.

The test cases in the above commit use:
SELECT nextval('sequence'), datavals FROM ... ORDER BY something;
type syntax. And I presume that's exactly what's happening internally when
there's a serial identity column (via query rewrite rules or another
mechanism).

So really, that just needs confirming. What happens with triggers?
Presumably the same as long as query rewrite rules are used, but this is
presumably getting deeper into the code for actual inserts after the
initial select. The jump to the output being ordered, is just a guess but
there's some bits and pieces that seem to suggest that there may indeed be
a marshalling process going on in the background (whether or not that is
linked to the original order is another matter).

I have set up a PostgreSQL server to test if I can break this hypothesis
and see what query explains can allude to. Does anyone have a test case
where the order is not preserved?

Might I also point out that if the implementation of parallel inserts does
create a bar then doing so may end up with the programmatic interfaces
(such as SQLAlchemy) not being able to use that feature (possibly reverting
back to single inserts). Ur, so what would be the problem being solved
with parallel inserts?

On Thu, 13 Apr 2023 at 02:27, Mike Bayer <mike_mp(at)zzzcomputing(dot)com> wrote:

> We do something different for uuids. These are usually created client
> side anyway or are very easy to create client side if not and then we also
> have a separate sentinel column option.
>
> Overall the whole thing is an optimization that can be turned off for
> uncommon cases. We just can't go back to having insert of 1000 rows be
> 3-5x slower for all tables w server generated primary keys which is what
> statement at a time gets us. Surrogate integer Pks w sequence or identity
> is the big case.
>
> On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote:
> > On Wed, 12 Apr 2023, Mike Bayer wrote:
> >
> >>ascending values for "mytable_id_seq" will correspond to the ascending
> >>values for "num".
> >
> > But, again, what if it’s uuid or something instead of a sequence?
> >
> > bye,
> > //mirabilos
> > --
> > 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld
> :-)
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John Howroyd 2023-04-13 03:05:56 Re: Guidance on INSERT RETURNING order
Previous Message David Rowley 2023-04-13 02:39:26 Re: [E] Re: parallel aggregation