how to return data from insert into ... on conflict ... returning ... into

From: Les <nagylzs(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: how to return data from insert into ... on conflict ... returning ... into
Date: 2023-06-20 20:26:23
Message-ID: CAKXe9UAuza2LTAJm_rH7JtXxSLg8mSfTMOfqLjmUwsDQCK7JUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Consider this example

drop table if exists tbl;

drop sequence if exists seq;

create sequence seq;

create table tbl(

id int8 not null primary key,

d bytea not null,

h bytea not null, -- hash of the data, calculated automatically

dummy byte default 0 -- dummy value, see below...

);

alter table tbl add constraint uidx_tbl_h unique(h);

create or replace function trg() returns trigger language plpgsql as

$function$

begin

new.h = sha256(new.d); -- auto-hash

if new.id is null then

new.id = nextval('seq');

end if;

return new;

end;

$function$;

create trigger trg before insert or update on tbl for each row execute
procedure trg();

The hash "h" is calculated automatically for each data value "d", and it
is a unique value. Let's suppose that for technical reasons, we want the
primary key to be an int8 value, and foreign keys in other tables will be
referencing tbl records using the sequentially generated tbl.id values.

The basic idea is that for already existing "d" values, we do not insert a
new record, but use the existing record and its identifier.

This code actually works:

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do update set dummy=0
returning id into aid; -- ok;

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') on conflict(h) do update set dummy=0
returning id into aid; -- ok;

raise notice '2->%', aid; -- ok

end;

$body$;

It will display the same id value for the same data values. But it updates
the record even when it does not need to be updated.

This code below does not work:

delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into
aid; -- ok;

raise notice '1->%', aid; -- null ??????

insert into tbl(d) values ('1') on conflict(h) do nothing returning id into
aid; -- ok;

raise notice '2->%', aid; -- null ??????

end;

$body$;

First it displays two non-null identifiers, then it displays two NULL
values.

The alternative would be something like this:

delete from tbl;

do

$body$

declare

aid int8;

begin

insert into tbl(d) values ('1') returning id into aid; -- ok

raise notice '1->%', aid; -- ok

insert into tbl(d) values ('2') returning id into aid; -- ok

raise notice '2->%', aid; -- ok

select id from tbl into aid where h = sha256('1');

if not found then

insert into tbl(d) values ('1') on conflict(h) do update set id=id+0
returning id into aid;

end if;

raise notice '1->%', aid; -- null ??????

select id from tbl into aid where h = sha256('2');

if not found then

insert into tbl(d) values ('2') on conflict(h) do update set id=id+0
returning id into aid;

end if;

raise notice '2->%', aid; -- null ??????

end;

$body$;

But there are several problems with this "solution":

1. Running select to check for existence introduces a race condition. Of
course it also depends on the transaction isolation, but in general it is
not guaranteed that the insert won't fail with the unique constraint on h,
even if the select did not find a matching record. (It might also introduce
a deadlock?) I might be wrong on this, I don't really know how plpgsql
procedures are executed, but I suspect that they can run in parallel.
2. Notice how the code changed. The calculation of h is done in the trigger
and also at two other places. This was a very trivial example, but in a
real world scenario, the calculation can be costly, and even if the above
solution works, it must calculate the possibly conflicting values twice. It
is ineffective, and it also requires to factor out the calculations to
separate functions (or even worse, duplicate the code for the
calculations). Even if calculations are not costly, this "solution" may
introduce a dependency hell, because the trg() trigger can access field
values that are calculated by other triggers that depend on each other. One
would have to pre-calculate everything at every place where "insert into"
is needed for the table, duplicating code and/or factoring the calculations
out to a function with many parameters.
3. This trivial example only had a single unique constraint, but there
could be more. When you have many unique constraints, then you have to
write multiple SELECT statements to check for existence before doing the
INSERT. Then the above "solution" becomes ugly and questionable.

Please note that adding a dummy byte does not solve the problem, because it
will always update the record, even if it does not need to be updated.
Degrades performance, possibly executes other triggers that do unwanted
modifications to the database.

I have read the documentation here

https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT

and the only possible actions are "do nothing" and "do update". The "do
nothing" does not update the record, that is clear. But it also does not
return any data, and that was not obvious to me. Why can't it return the
data from the conflicted record? After the instruction is completed, "GET
DIAGNOSTICS" could still be used to check the number of updates.

I understand that changing this behaviour may break backward compatibility.
Would it be possible to introduce a "DO RETURN" clause that returns the
data, even when there was a conflict?

Are there any better alternatives?

Thank you,

Laszlo

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Millas 2023-06-20 20:34:48 Re: pb with join plan
Previous Message David Rowley 2023-06-20 20:14:05 Re: pb with join plan