From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Marko Tiikkaja <marko(at)joh(dot)to> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT .. ON CONFLICT DO SELECT [FOR ..] |
Date: | 2017-09-04 02:09:43 |
Message-ID: | CAH2-Wzk+FpVAjnto=U9jFKZs_SWUU6G1ZPSWowSqLNATgCyRSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 15, 2017 at 12:17 AM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:
> On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>
>> On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:
>> > Attached is a patch for $SUBJECT. It might still be a bit rough around
>> > the
>> > edges and probably light on docs and testing, but I thought I'd post it
>> > anyway so I won't forget.
>>
>> Is it possible for ON CONFLICT DO SELECT to raise a cardinality
>> violation error? Why or why not?
>
>
> No. I don't see when that would need to happen. But I'm guessing you have
> a case in mind?
Actually, no, I didn't. But I wondered if you did. I think that it
makes some sense not to, now that I think about it. ON CONFLICT DO
NOTHING doesn't have cardinality violations, because it cannot affect
a row twice if there are duplicates proposed for insertion (at least,
not through any ON CONFLICT related codepath). But, this opinion only
applies to ON CONFLICT DO SELECT, not ON CONFLICT DO SELECT FOR
UPDATE. And I have other reservations, which I'll go in to
momentarily, about ON CONFLICT DO SELECT in general. So, the upshot is
that I think we need cardinality violations in all cases for this
feature. Why would a user ever not want to know that the row was
locked twice?
On to the subject of my more general reservation: Why did you include
ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR
UPDATE (and FOR SHARE, ...) ?
I think I know what you're going to say about it: ON CONFLICT DO
NOTHING doesn't lock the conflicting row, so why should I insist on it
here (why not have an ON CONFLICT DO SELECT variant, too)? That's not
a bad argument. However, I think that there is still a difference.
Namely, ON CONFLICT DO NOTHING doesn't let you project the rows with
RETURNING (that may not even be visible to the command's MVCC snapshot
-- the rows that we also don't lock), because those are simply not the
semantics it has. DO NOTHING is more or less about ETL use-cases, some
of which involve data from very unclean sources. It seems questionable
to cite that as precedent for not locking a row (that is, for having a
plain ON CONFLICT DO SELECT variant at all).
In other words, while ON CONFLICT DO NOTHING may have set a precedent
here, it at least has semantics that limit the consequences of not
locking the row; and it *feels* a little bit dirty to use it
indifferently, even where that makes sense. ON CONFLICT DO SELECT is
probably going to be used within wCTEs some of the time. I'm not sure
that a plain ON CONFLICT DO SELECT variant won't allow unpredictable,
complicated problems when composed within a more complicated query.
(This brings me back!)
In other other words, plain SELECT FOR UPDATE has to do all the EPQ
stuff, in order to confront many of the same issues that ON CONFLICT
must confront in its own way [1], but a plain SELECT never does any
EPQ stuff at all. It seems to follow that a plain ON CONFLICT DO
SELECT is an oxymoron. If I've missed the point of ON CONFLICT DO
SELECT, please let me know how.
[1] https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2017-09-04 02:10:42 | Re: pgbench: Skipping the creating primary keys after initialization |
Previous Message | Amit Langote | 2017-09-04 01:10:41 | Re: path toward faster partition pruning |