UPSERT in Postgres

From: Louis Tian <louis(dot)tian(at)aquamonix(dot)com(dot)au>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: UPSERT in Postgres
Date: 2023-04-06 06:21:34
Message-ID: SY4P282MB1052A434D91AC0656589B6AEA6919@SY4P282MB1052.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is a question/feature request. 

Virtually all references on the internet today suggests one can "upsert" in PostgreSQL using the "insert ... on conflict do update ..." statement. 
But this is not complete true. 

The PostgreSQL's own wiki page (https://wiki.postgresql.org/wiki/UPSERT#.22UPSERT.22_definition) defines UPSERT as
 "UPSERT" is a DBMS feature that allows a DML statement's author to atomically either insert a row, or on the basis of the row already existing, UPDATE that existing row instead,
I believe this definition is correct and consistent with defintion elsewhere (https://en.wiktionary.org/wiki/upsert)

An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make sense since it's the primary key that uniquely identifies a row. 
let's say we have a very simple table

```
create table person (
      id int primary key,
      name text not null,
           is_active boolean
)
``` 

Given the definition of upsert, I'd expect an upsert command to do the following.
- `upsert into person (id, name) values (0, 'foo')` to insert a new row 
- `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above

Naturally, since there isn't a real upsert command in PostgreSQL this won't work today. 
But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet seems to suggest. 

insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id, is_active=excluded.is_active

Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness constraint is violated.
Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.
That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key columns.
So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue the current behavior is incorrect?

This has been a source confusion to say at least.
https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f

The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned limitation.
The downside is it is rather verbose.

*Question*
This there a way to do an upsert proper prior to PG15?

*Feature Request*
Given that UPSERT is an *idempotent* operator it is extremely useful.
Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily.

Regards,
Louis Tian

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marco Fortina 2023-04-06 10:51:07 Proposal: Shared Work Mem Area
Previous Message Tom Lane 2023-04-05 15:07:59 Re: Possible old and fixed bug in Postgres?