Quick Links

Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject:	Re: INSERT ... ON CONFLICT {UPDATE \| IGNORE}
Date:	2014-10-08 22:24:51
Message-ID:	CAM3SWZTTiqocznYvOJks3qQX6W1-Ln7qXgKyV0c7K7fmHffOVw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 8, 2014 at 2:04 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> While I am also happy with taking a vote, if we do so I vote against
>> even this much less MERGE-like syntax. It's verbose, and makes much
>> less sense when the mechanism is driven by would-be duplicate key
>> violations rather than an outer join.
>
> It wouldn't be driven by an outer join, not sure where that comes from.

Right, I understood that it wouldn't be - which is the point. So with
an UPSERT that retains influence from MERGE, NOT MATCHED means "no
conflict", MATCHED means "conflict". That just seems like an odd way
to spell the concept given, as you say, that we're not talking about
an outer join.

> MERGE is verbose, I agree. I don't always like the SQL Standard, I
> just think we should follow it as much as possible. You can't change
> the fact that MERGE exists, so I don't see a reason to have two
> variants of syntax that do roughly the same thing.
>
> MERGE syntax would allow many things, such as this...
> WHEN NOT MATCHED AND x > 7 THEN
> INSERT
> WHEN NOT MATCHED THEN
> INSERT
> WHEN MATCHED AND y = 5 THEN
> DO NOTHING
> WHEN MATCHED THEN
> UPDATE
>
> etc

But then you can have before row insert triggers fire, which as you
acknowledge is more surprising with this syntax.

> I spoke to someone today that preferred a new command keyword, like
> UPSERT, because the semantics of triggers are weird. Having a before
> insert trigger fire when there is no insert is quite strange. Properly
> documenting that on hackers would help; has the comments made on the
> Django list been replayed here in some form?

Yes. It's also mentioned in the commit message of CONFLICTING() (patch
0003-*). And the documentation (both the proposed INSERT
documentation, and the trigger documentation). There is a large
comment on it in the code. So I've said it many times.

> I'm very scared by your comments about data modifying CTEs etc.. You
> have no definition of how they will work, not tests of that. That part
> isn't looking like a benefit as things currently stand.

Actually, I have a few smoke tests for that. But I don't see any need
for special handling. When you have a data-modifying CTE, it can
contain an INSERT, and there are no special restrictions on that
INSERT (other than that it may not itself have a CTE, but that's true
more generally). You can have data-modifying CTEs containing INSERTs,
and data-modifying CTEs containing UPDATEs....what I've done is have
data-modifying CTEs contain INSERTs that also happen to have an ON
CONFLICT UPDATE clause.

This new clause of INSERTs is in no more need of special documentation
regarding interactions with data-modifying CTEs than UPDATE .... WHERE
CURRENT OF is. The only possible exception I can think of would be
cardinality violations where a vanilla INSERT in one part of a command
(one data-modifying CTE) gives problems to the "UPSERT part" of the
same command (because we give a special cardinality violation message
when we try to update the same tuple twice in the same command). But
that's a pretty imaginative complaint, and I doubt it would really
surprise someone.

Why would you be surprised by the fact that a new clause for INSERT
plays nicely with existing clauses? It's nothing special - there is no
special handling.

> I'm still waiting for some more docs to describe your intentions so
> they can be reviewed.

I think it would be useful to add several more isolation tests,
highlighting some of the cases you talked about. I'll work on that.
While the way forward for WITHIN isn't clear, I think a WITHIN PRIMARY
KEY variant would certainly be useful. Maybe it would be okay to
forget about naming a specific unique index, while supporting an
(optional) WITHIN PRIMARY KEY/NOT WITHIN PRIMARY KEY. It doesn't
totally solve the problems, but may be a good compromise that mostly
satisfies people that want to be able to clearly indicate user intent
(Kevin, in particular), and satisfies other people that don't want to
name a unique index (Heikki, in particular). Certainly, the Django
people would like that, since they said as much.

> Also, it would be useful to hear that your're going to do something
> about the references to rows using conflicting(), since nobody has
> agreed with you there. Or hopefully even that you've listened and
> implemented something differently already. (We need that, whatever we
> do with other elements of syntax).

Do you really expect me to do major work on some aspect of the syntax
like that, given, as you say, that nobody explicitly agreed with me
(and only you disagreed with me)? The only remark I heard on that was
from you (you'd prefer to use NEW.* and OLD.*). But you spent much
more time talking about getting something MERGE-like, which
NEW.*/OLD.* clearly isn't.

CONFLICTING() is very close (identical?) to MySQL's use of "ON
DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that,
but it's news to me that people take particular issue with it.

> Overall, I'm not seeing too many comments that indicate you are
> accepting review comments and acting upon them. If you want acceptance
> from others, you need to begin with some yourself.

What, specifically, have I failed to act on? We are discussing the
syntax here. I have very valid practical reasons for wanting to make
this feature a clause of INSERT. That is a view that Andres seemed to
agree with [1], for example.

[1] http://www.postgresql.org/message-id/20140929070235.GP1169@alap3.anarazel.de
--
Peter Geoghegan

In response to

Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} at 2014-10-08 21:04:19 from Simon Riggs

Responses

Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} at 2014-10-09 05:49:47 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2014-10-08 22:32:21	Re: pg_background (and more parallelism infrastructure patches)
Previous Message	Gavin Flower	2014-10-08 22:23:59	Re: What exactly is our CRC algorithm?