Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-12-30 03:57:31
Message-ID: CAM3SWZRNM=Uf-UVE_uCxwQs0hiyZEXLdAPmcnmBWaoALW8Yi1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 29, 2013 at 8:18 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
>> My position is not based on a gut feeling. It is based on carefully
>> considering the interactions of the constituent parts, plus the
>> experience of actually building a working prototype.
>
>
> I also carefully considered all that stuff, and reached a different
> conclusion. Plus I also actually built a working prototype (for some
> approximation of "working" - it's still a prototype).

Well, clearly you're in agreement with me about unprincipled
deadlocking. That's what I was referring to here.

> Frankly I'm pissed off that you dismissed from the start the approach that
> seems much better to me. I gave you a couple of pointers very early on: look
> at the way we do exclusion constraints, and try to do something like promise
> tuples or killing an already-inserted tuple. You dismissed that, so I had to
> write that prototype myself.

Sorry, but that isn't consistent with my recollection at all. The
first e-mail you sent to any of the threads on this was on 2013-11-18.
Your first cut at a prototype was on 2013-11-19, the very next day. If
you think that I ought to have been able to know what you had in mind
based on conversations at pgConf.EU, you're giving me way too much
credit. The only thing vaguely along those lines that I recall you
mentioning to me in Dublin was that you thought I should make this
work with exclusion constraints - I was mostly explaining what I'd
done, and why. I was pleased that you listened courteously, but I
didn't have a clue what you had in mind, not least because to the best
of my recollection you didn't say anything about killing tuples. I'm
not going to swear that you didn't say something like that, since a
lot of things were said in a relatively short period, but it's
certainly true that I was quite curious about how you might go about
incorporating exclusion constraints into this for a while before you
began visibly participating on list.

Now, when you actually posted the prototype, I realized that it was
the same basic design that I'd cited in my very first e-mail on the
IGNORE patch (the one that didn't have row locking at all) - nobody
else wanted to do heap insertion first for promise tuples. I read that
2007 thread [1] a long time ago, but that recognition only came when
you posted your first prototype, or perhaps shortly before when you
started participating on list.

I am unconvinced that making this work for exclusion constraints is
compelling, except for IGNORE, which is sensible but something I would
not weigh heavily at all. In any case, since your implementation
currently doesn't lock more than one row per tuple proposed for
insertion (even though exclusion constraints could have a huge number
of rows to lock when you propose to insert a row with a range covering
a decade, and many rows need to be locked, where with unique indexes
you only ever lock either 0 or 1 rows per slot). I could fairly easily
extend my patch to have it work for exclusion constraints with IGNORE
only.

You didn't try and convince me that what you have proposed is better
than what I have. You immediately described your approach. You did say
some things about buffer locking, but you didn't differentiate between
what was essential to my design, and what was incidental, merely
calling it scary (i.e. you did something similar to what you're
accusing me of here - you didn't dismiss it, but you didn't address it
either). If you look back at the discussion throughout late November
and much of December, it is true that I am consistently critical, but
that was clearly a useful exercise, because now we know there is a
problem to fix.

Why is your approach better? You never actually said. In short, I
think that my approach may be better because it doesn't conflate row
locking with value locking (therefore making it easier to reason
about, maintaining modularity), and that it never bloats, and that
releasing locks is clearly cheap which may matter a lot sometimes. I
don't think the "intent exclusive" locking of my most recent revision
is problematic for performance - as the L&Y paper says, exclusive
locking leaf pages only is not that problematic. Extending that in a
way that still allows reads, only for an instant isn't going to be too
problematic.

I'm not sure that this is essential to your design, and I'm not sure
what your thoughts are on this, but Andres has defended the idea of
promise tuples that lock old values indefinitely pending the outcome
of another xact where we'll probably want to update, and I think this
is a bad idea. Andres recently seemed less convinced of this than in
the past [2], but I'd like to hear your thoughts. It's very pertinent,
because I think releasing locks needs to be cheap, and rendering old
promise tuples invisible is not cheap.

I'm not trying to piss anyone off here - I need all the help I can
get. These are important questions, and I'm not posing them to you to
be contrary.

> Even after that, you have spent zero effort to
> resolve the remaining issues with that approach, proclaiming that it's
> somehow fundamentally flawed and that locking index pages is obviously
> better. It's not. Sure, it still needs work, but the remaining issue isn't
> that difficult to resolve. Surely not any more complicated than what you did
> with heavy-weight locks on b-tree pages in your latest patch.

I didn't say that locking index pages is obviously better, and I
certainly didn't say anything about what you've done being
fundamentally flawed. I said that I "have limited enthusiasm for
expanding the modularity violation that exists within the btree AM".
Based on what Andres has said in the recent past on this thread about
the current btree code, that "in my opinion, bt_check_unique() doing
[locking heap and btree buffers concurrently] is a bug that needs
fixing" [3], can you really blame me? What this patch does not need is
another controversy. It seems pretty reasonable and sane that we'd
implement this by generalizing from the existing mechanism. Plus there
is plenty of evidence of other systems escalating what they call
"latches" and what we call buffer locks to heavyweight locks, I
believe going back to the 1970s. It's far from radical.

> Now, enough with the venting. Back to drawing board, to figure out how best
> to fix the deadlock issue with the insert_on_dup-kill-on-conflict-2.patch.
> Please help me with that.

I will help you. I'll look at it tomorrow.

> PS. In btreelock_insert_on_dup_v5.2013_12_28.patch, the language used in the
> additional text in README is quite difficult to read. Too many difficult
> sentences and constructs for a non-native English speaker like me. I had to
> look up "concomitantly" in a dictionary and I'm still not sure I understand
> that sentence :-).

Perhaps I should have eschewed obfuscation and espoused elucidation
here. I was trying to fit the style of the surrounding text. I just
mean that aside from the obvious reason for holding a pin, doing so at
the same time precludes deletion of the buffer, which requires a
"super exclusive" lock on the buffer.

[1] http://www.postgresql.org/message-id/1172858409.3760.1618.camel@silverbirch.site

[2] http://www.postgresql.org/message-id/20131227075453.GB17584@alap2.anarazel.de

[3] http://www.postgresql.org/message-id/20130914221524.GF4071@awork2.anarazel.de
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-12-30 05:02:57 Re: [PATCH] Regression tests in windows ignore white space
Previous Message Noah Misch 2013-12-30 03:02:07 Re: [bug fix] multibyte messages are displayed incorrectly on the client