Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-10-24 22:48:50
Message-ID: CAM3SWZTMJMvdWTB25vSR9zEXYmoAiQN=rxopLkQpZnMHmA1WFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 24, 2014 at 1:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The problem here isn't that I haven't read your emails. I have read
> them all, including that one. Moreover, this isn't the first time
> you've asserted that someone hasn't read one of your emails. So if
> we're going to say what we each think would be helpful, then I think
> it would be helpful if you stopped accusing the people who are taking
> time to provide feedback on your work of having failed to read your
> emails. It's possible that there may be instances where that problem
> exists, but it beggars credulity to suppose that the repeated
> unanimous consensus against some of your design decisions is entirely
> an artifact of failure to pay attention.

Okay.

> The fact is, I don't feel obliged to respond to every one of your
> emails, just as you don't respond to every one of mine. If you want
> this patch to ever get committed, it's your job to push it forward -
> not mine, not Simon's, and not Heikki's. Sometimes that means you
> have to solve hard problems instead of just articulating what they
> are.

Okay.

> You're conflating the user-visible syntax with the parse tree
> representation in way that is utterly without foundation. I don't
> have a position at this point on which parse-analysis representation
> is preferable, but it's completely wrong-headed to say that you
> "can't" make NEW.x produce the same parse-analysis output that your
> CONFLICTING(x) syntax would have created. Sure, it might be harder,
> but it's not that much harder, and it's definitely not an unsolvable
> problem.

I don't believe I did. The broader point is that the difficulty in
making that work reflects the conceptually messiness, from
user-visible aspects down. I can work with the difficulty, and I may
even be able to live with the messiness, but I'm trying to bring the
problems with it to a head sooner rather than later for good practical
reasons. In all sincerity, my real concern is that you or the others
will change your mind when I actually go and implement an OLD.* style
syntax, and see the gory details. I regret it if to ask this is to ask
too much of you, but FYI that's the thought process behind it.

> I do acknowledge that there might be a better syntax out there than
> NEW.x and OLD.x. I have not seen one proposed that I like better.
> Feel free to propose something. But don't keep re-proposing something
> that LOOKSLIKE(this) because nobody - other than perhaps you - likes
> that.

Okay.

So in an UPDATE targetlist, you can assign DEFAULT to a column. Maybe
that's an interesting precedent. During rewriting, this gets rewritten
such that you end up with something that looks to the planner as if
the original query included a constant (this actually comes from a
catalog look-up for the column during rewriting). What if we spelled
EXCLUDING/CONFLICTING as follows:

INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val =
EXCLUDED || 'this works' WHERE another_col != EXCLUDED;

Then rewriting would figure these details out. From a design
perspective, there'd need to be a few details worked out about how
inference actually works - inferring *which* column the EXCLUDED
expression actually referred to, but it seems doable, especially given
the existing restrictions on the structure of the UPDATE. We're not
rewriting from a SetToDefault to a constant, but a SetToDefault-like
thing to a special Var (actually, the finished representation probably
makes it to the execution stage with that Var representation filled
in, unlike SetToDefault, but it's basically the same pattern). It
solves my problem with dummy range table entries. Actually, *any* new
kind of expression accomplishes this just as well. My concern here is
more around not needing cute tricks with dummy RTEs than it is around
being in favor of any particular expression-based syntax.

What do you think of that?

> And don't use the difficulty of parse analysis as a
> justification for your proposed syntax, because, except in extreme
> cases, there are going to be very few if any regular contributors to
> this mailing list who will accept that as a justification for one
> syntax over another. Syntax needs to be justified by being beautiful,
> elegant, precedent-driven, orthogonal, and minimalist - not by whether
> you might need an extra 25-75 lines of parse analysis code to make it
> work.

Well, that isn't the case here. I'd much rather code my way out of a
disagreement with you.

>>>> Unique index inference (i.e. the way we figure out *which* unique
>>>> index to use) occurs during parse analysis. I think it would be
>>>> inappropriate, and certainly inconvenient to do it during planning.
>>>
>>> You're wrong. The choice of which index to use is clearly wildly
>>> inappropriately placed in the parse analysis phase, and if you think
>>> that has any chance of ever being committed by anyone, then you are
>>> presuming the existence of a committer who won't mind ignoring the
>>> almost-immediate revert request that would draw from, at the very
>>> least, Tom.
>>
>> Why? This has nothing to do with optimization.
>
> That is false. If there is more than one index that could be used,
> the system should select the best one. That is an optimization
> decision per se. Also, if a plan is saved - in the plancache, say, or
> in a view - the query can be re-planned if the index it depends on is
> dropped, but there's no way to do parse analysis.

Generating index paths for the UPDATE is a waste of cycles.
Theoretically, there could be an (a, b, c) unique index and a (c,b,a)
unique index, and those two might have a non-equal cost to scan. But
that almost certainly isn't going to happen in practice, since that's
a rather questionable indexing strategy, and even when it does, you're
going to have to insert into all the unique indexes a good proportion
of the time anyway, making the benefits of that approach pale in
comparison to the costs. And that's just the cost in CPU cycles, and
not code complexity.

I don't know why you want to bring a cost model, or choice of indexes
into this. It simply isn't comparable to how the system comes up with
which index to use in all other contexts. Now, I could easily make all
this happen during planning, just to not have to argue with you, but I
think that doing so is less similar to how things already work, not
more similar. It certainly doesn't imply more code reuse, since
get_relation_info() is clearly quite unsuitable.

> Well, I'm equally tired of being asked to review patches that respond
> to only a small percentage of the feedback already given. I, too,
> sometimes lack the time to incorporate the feedback of others into my
> patches. When that happens, I don't re-post them until I do have the
> time. I've even been known to drop patches altogether rather than
> continue arguing about them, as you will of course recall. There's no
> shame in taking longer to get something done, but asking other people
> to spend time on it when you haven't had time yourself can lead to
> frustrations.

From my point of view, I spent a significant amount of time making the
patch more or less match your proposed design for unique index
inference. It is discouraging to hear that you think I'm not
cooperating with community process. I'm doing my best. I think it
would be a bad idea for me to not engage with the community for an
extended period at this point. There were plenty of other issues
address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that
you highlighted (or the other thing you highlighted around where to do
unique index inference).
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2014-10-24 23:13:37 Re: pg_background (and more parallelism infrastructure patches)
Previous Message Florian Pflug 2014-10-24 21:32:50 Re: Question about RI checks