Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-02-27 18:19:11
Message-ID: CABOikdM8wi9_vFpU6_HtYFCA1xuovRXfpznawb63B60DH=pEoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 16, 2018 at 6:37 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

>
> ISTM that a MERGE isn't really a thing that replaces 2 or 3 other DML
> statements, at least in most cases. It's more like a replacement for
> procedural code with an outer join, with an INSERT, UPDATE or DELETE
> that affects zero or one rows inside the procedural loop that
> processes matching/non-matching rows. The equivalent procedural code
> could ultimately perform *thousands* of snapshot acquisitions for
> thousands of RC DML statements. MERGE is sometimes explained in terms
> of "here is the kind of procedural code that you don't have to write
> anymore, thanks to MERGE" -- that's what the code looks like.
>
> I attach a rough example of this, that uses plpgsql.
>

Thanks for writing the sample code. I understand you probably don't mean to
suggest that we need to mimic the behaviour of the plpgsql code and the
semantics offered by MERGE would most likely be different than what the
plpgsql sample does. Because there are several problems with the plpgsql
code:

- It would never turn a MATCHED case into a NOT MATCHED case because of
concurrent UPDATE/DELETE
- The WHERE clauses attached to the UPDATE/DELETE statement should be using
the quals attached to the WHEN clauses to ensure they are evaluated on the
new version of the row, if needed.

>
> >> Some novel new behavior -- "EPQ with a twist"-- is clearly necessary.
> >> I feel a bit uneasy about it because anything that anybody suggests is
> >> likely to be at least a bit arbitrary (EPQ itself is kind of
> >> arbitrary). We only get to make a decision on how "EPQ with a twist"
> >> will work once, and that should be a decision that is made following
> >> careful deliberation. Ambiguity is much more likely to kill a patch
> >> than a specific technical defect, at least in my experience. Somebody
> >> can usually just fix a technical defect.
>

TBH that's one reason why I like Simon's proposed behaviour of throwing
errors in case of corner cases. I am not suggesting that's what we do at
the end, but it's definitely worth considering.

> >
> >
> > While I agree, I think we need to make these decisions in a time bound
> > fashion. If there is too much ambiguity, then it's not a bad idea to
> settle
> > for throwing appropriate errors instead of providing semantically wrong
> > answers, even in some remote corner case.
>
> Everything is still on the table, I think.
>

Ok.

>
> > Ok. I am now back from holidays and I will too start thinking about this.
> > I've also requested a colleague to help us with comparing it against
> > Oracle's behaviour. N That's not a gold standard for us, but knowing how
> > other major databases handle RC conflicts, is not a bad idea.
>
> The fact that Oracle doesn't allow WHEN MATCHED ... AND quals did seem
> like it might be significant to me.
>
>
Here are some observations from Rahila's analysis so far. I must say,
Oracle's handling seems quite inconsistent, especially the conditions under
which it sometimes re-evaluates the join and sometimes don't.

- Oracle does not support multiple WHEN MATCHED clauses. So the question of
re-checking all WHEN clauses does not arise.

- Only one UPDATE and one DELETE clause is supported. The DELETE must be
used in conjunction with UPDATE.

- The DELETE clause is invoked iff the UPDATE clause is invoked. It works
on the updated rows. Since the row is already updated (and locked) by the
MERGE, DELETE action never blocks on a concurrent update/delete

- MERGE does not allow updating the column used in the JOIN's ON qual

- In case of concurrent UPDATE, the join is re-evaluated iff the concurrent
UPDATE updates (modifies?) the same column that MERGE is updating OR a
column
that MERGE is referencing in the WHERE clause is updated by the
concurrent update. IOW if the
MERGE and concurrent UPDATE is operating on different columns, join is NOT
re-evaluated, thus possibly invoking WHEN MATCHED action on a row which no
longer matches the join condition.

- In case of concurrent DELETE, the join is re-evaluated and the action may
change from MATCHED to NOT MATCHED

I am curiously surprised by it's behaviour of re-evaluating join only when
certain columns are updated. It looks to me irrespective of what we choose,
our implementation would be much superior to what Oracle offers.

BTW I've sent v17a of the patch, which is very close to being complete from
my perspective (except some documentation fixes/improvements). The only
thing pending is the decision to accept or change the currently implemented
concurrency semantics.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-02-27 18:36:59 Sigh, I broke crake again
Previous Message Pavan Deolasee 2018-02-27 17:53:06 Re: [HACKERS] MERGE SQL Statement for PG11