From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Making joins involving ctid work for the benefit of UPSERT |
Date: | 2014-07-28 15:37:07 |
Message-ID: | CA+TgmoZdW5EfRVRgkdJzXG2qun2FnSf_c1eNunO0gS+svhp6+Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 23, 2014 at 7:35 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> It's certain arguable whether you should INSERT and then turn failures
>> into an update or try to UPDATE and then turn failures into an INSERT;
>> we might even want to have both options available, though that smells
>> a little like airing too much of our dirty laundry. But I think I
>> generally favor optimizing for the UPDATE case for more or less the
>> same reasons Kevin articulated.
>
> I don't see the connection between this and Kevin's remarks. And FWIW,
> I don't see a reason to favor inserts or updates. Fortunately, what I
> have balances both cases very well, and doesn't cause bloat. The work
> of descending the index to lock it isn't wasted if an update is
> required. My implementation decides to either insert or update at
> literally the latest possible moment.
AFAIUI, this is because your implementation uses lwlocks in a way that
Andres and I both find unacceptable. My suspicion is that any version
of this that ends up getting committed is going to involve a risk of
bloat in cases involving retries, and I think it will be easier to
minimize bloat in an update-driven implementation. But I suppose
that's speculative.
>> Here you seem to be suggested that I intended to propose your existing
>> design rather than something else, which I didn't. In this design,
>> you find the conflict (at most one) but scanning for the tuple to be
>> updated.
>
> Yes, but what if you don't see a conflict because it isn't visible to
> your snapshot, and then you insert, and only then (step 5), presumably
> with a dirty snapshot, you find a conflict? How does the loop
> terminate if that brings you back to step 1 with the same MVCC
> snapshot feeding the update?
Good point. Maybe the syntax should be something like:
UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
...] | select_query }
That would address both the concern about being able to pipe multiple
tuples through it and the point you just raised. We look for a row
that matches each given tuple on the key columns; if one is found, we
update it; if none is found, we insert.
> I agree that you want to uniquely identify each tuple. What I meant
> was, why should we not be able to upsert multiple rows in a single
> command? What's wrong with that?
Nothing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-07-28 15:46:23 | Re: gaussian distribution pgbench -- splits v4 |
Previous Message | Kevin Grittner | 2014-07-28 15:24:27 | Re: delta relations in AFTER triggers |