Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-09-22 19:54:57
Message-ID: CAM3SWZQ4O244fj_RwJUiZ=CiGOSjaYAc2_NwQYwKrvNAT1ExnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 22, 2013 at 2:10 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I can't follow here. Why does e.g. the promise tuple approach bloat more
> than the subxact example?
> The protocol is roughly:
> 1) Insert index pointer containing an xid to be waiting upon instead of
> the target tid into all indexes
> 2) Insert heap tuple, we can be sure there's no conflict now
> 3) Go through the indexes and repoint the item to point to the tid of the
> heaptuple instead of the xid.
>
> There's zero heap or index bloat in the uncontended case. In the
> contended case it's just the promise tuples from 1) that are inserted
> before the conflict is detected. Those can be marked as dead when the
> conflict happened.

It depends on your definition of the contended case. You're assuming
that insertion is the most probable outcome, when in fact much of the
time updating is just as likely or even more likely. Many promise
tuples may be inserted before actually seeing a conflict and deciding
to update/lock for update. In order for the example in the docs to
bloat at all, both the UPDATE and the INSERT need to fail within a
tiny temporal window - that's what I mean by uncontended (it is
usually tiny because if the UPDATE blocks, that often means it will
succeed anyway, but if not the INSERT will very probably succeed).

This is because the UPDATE won't bloat when no existing row is seen,
because its subplan will return no rows. The INSERT will only bloat if
it fails, which is generally very unlikely because of the fact that
the UPDATE just did nothing. Contrast that with bloating almost every
time an UPDATE is necessary (I think that bloat that is generally
cleaned up synchronously is still bloat). That's before we even talk
about the additional overhead. Making the locks expensive to
release/clean-up could really hurt, since it appears they'll *have* to
be unlocked before row locking, and during that time concurrent
activity affecting the row to be locked can necessitate a full restart
- that's a window we want to keep as small as possible.

I think reviewer time would for now be much better spent discussing
the patch at a higher level (along the lines of my recent mail to
Stephen and Robert). I've been at least as guilty as anyone else in
getting mired in these details. We'll be much better equipped to have
this discussion afterwards, because it isn't clear to us if we really
need or would find it at all useful to have long-lasting value locks,
how frequently we'll need to retry and for what reasons, and so on.

My immediate concern as the patch author is to come up with a better
answer to the problem that Robert described [1], because "hey, I
locked the row -- you take it from here user that might not have any
version of it visible to you" is not good enough. I hope that there
isn't a tension between solving that problem and offering the
flexibility and composability of the proposed syntax.

[1] http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-09-22 20:39:36 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Tomas Vondra 2013-09-22 19:43:05 Re: Improving avg performance for numeric