Re: MERGE SQL Statement for PG11

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MERGE SQL Statement for PG11
Date: 2017-10-28 18:39:33
Message-ID: CAH2-Wz=AxEKE7sJEukPN3Vgu6sU5QwuLCBHuF-oJbkOLWsxTQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 28, 2017 at 3:10 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> SQL:2011 specifically states "The extent to which an
> SQL-implementation may disallow independent changes that are not
> significant is implementation-defined”, so in my reading the above
> behaviour would make us fully spec compliant. Thank you to Peter for
> providing the infrastructure on which this is now possible for PG11.
>
> Serge puts this very nicely by identifying two different use cases for MERGE.

MERGE benefits from having a join that is more or less implemented in
the same way as any other join. It can be a merge join, hash join, or
nestloop join. ON CONFLICT doesn't work using a join.

Should I to take it that you won't be supporting any of these
alternative join algorithms? If not, then you'll have something that
really isn't comparable to MERGE as implemented in Oracle, SQL Server,
or DB2. They *all* do this.

Would the user be able to omit WHEN NOT MATCHED/INSERT, as is the case
with every existing MERGE implementation? If so, what actually happens
under the hood when WHEN NOT MATCHED is omitted? For example, would
you actually use a regular "UPDATE FROM" style join, as opposed to the
ON CONFLICT infrastructure? And, if that is so, can you justify the
semantic difference for rows that are updated in each scenario
(omitted vs. not omitted) in READ COMMITTED mode? Note that this could
be the difference between updating a row when *no* version is visible
to our MVCC snapshot, as opposed to doing the EPQ stuff and updating
the latest row version if possible. That's a huge, surprising
difference. On top of all this, you risk live-lock if INSERT isn't a
possible outcome (this is also why ON CONFLICT can never accept a
predicate on its INSERT portion -- again, quite unlike MERGE).

Why not just follow what other systems do? It's actually easier to go
that way, and you get a better outcome. ON CONFLICT involves what you
could call a sleight of hand, and I fear that you don't appreciate
just how specialized the internal infrastructure is.

> Now, I accept that you might also want a MERGE statement that
> continues to work even if there is no unique constraint, but it would
> need to have different properties to the above. I do not in any way
> argue against adding that.

Maybe you *should* be arguing against it, though, and arguing against
ever supporting anything but equijoins, because these things will
*become* impossible if you go down that road. By starting with the ON
CONFLICT infrastructure, while framing no-unique-index-support as work
for some unspecified future release, you're leaving it up to someone
else to resolve the problems. Someone else must square the circle of
mixing ON CONFLICT semantics with fully generalized MERGE semantics.
But who?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-10-28 19:49:12 Re: MERGE SQL Statement for PG11
Previous Message Julien Rouhaud 2017-10-28 17:44:24 Re: unsafe tuple releasing in get_default_partition_oid