Re: Making joins involving ctid work for the benefit of UPSERT

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making joins involving ctid work for the benefit of UPSERT
Date: 2014-07-28 15:37:07
Message-ID: CA+TgmoZdW5EfRVRgkdJzXG2qun2FnSf_c1eNunO0gS+svhp6+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 23, 2014 at 7:35 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> It's certain arguable whether you should INSERT and then turn failures
>> into an update or try to UPDATE and then turn failures into an INSERT;
>> we might even want to have both options available, though that smells
>> a little like airing too much of our dirty laundry. But I think I
>> generally favor optimizing for the UPDATE case for more or less the
>> same reasons Kevin articulated.
>
> I don't see the connection between this and Kevin's remarks. And FWIW,
> I don't see a reason to favor inserts or updates. Fortunately, what I
> have balances both cases very well, and doesn't cause bloat. The work
> of descending the index to lock it isn't wasted if an update is
> required. My implementation decides to either insert or update at
> literally the latest possible moment.

AFAIUI, this is because your implementation uses lwlocks in a way that
Andres and I both find unacceptable. My suspicion is that any version
of this that ends up getting committed is going to involve a risk of
bloat in cases involving retries, and I think it will be easier to
minimize bloat in an update-driven implementation. But I suppose
that's speculative.

>> Here you seem to be suggested that I intended to propose your existing
>> design rather than something else, which I didn't. In this design,
>> you find the conflict (at most one) but scanning for the tuple to be
>> updated.
>
> Yes, but what if you don't see a conflict because it isn't visible to
> your snapshot, and then you insert, and only then (step 5), presumably
> with a dirty snapshot, you find a conflict? How does the loop
> terminate if that brings you back to step 1 with the same MVCC
> snapshot feeding the update?

Good point. Maybe the syntax should be something like:

UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
...] | select_query }

That would address both the concern about being able to pipe multiple
tuples through it and the point you just raised. We look for a row
that matches each given tuple on the key columns; if one is found, we
update it; if none is found, we insert.

> I agree that you want to uniquely identify each tuple. What I meant
> was, why should we not be able to upsert multiple rows in a single
> command? What's wrong with that?

Nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-07-28 15:46:23 Re: gaussian distribution pgbench -- splits v4
Previous Message Kevin Grittner 2014-07-28 15:24:27 Re: delta relations in AFTER triggers