From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key |
Date: | 2018-04-06 04:11:07 |
Message-ID: | CAA4eK1K0YJ-qWrK5wwOhrPmzMavMZS54f0TNZzUyXW6=CorD2Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Apr 6, 2018 at 1:13 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-04-05 10:17:59 +0530, Amit Kapila wrote:
>> On Thu, Apr 5, 2018 at 7:14 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Why? tid is both an input and output parameter. The input tid is
>> valid and is verified at the top of the function, now if no row
>> version is visible, then it should have the same value as passed Tid.
>> I am not telling that it was super important to have that assertion,
>> but if it is valid then it can catch a case where we might have missed
>> checking the tuple which has invalid block number (essentialy the case
>> introduced by the patch).
>
> You're right. It's bonkers that the output parameter isn't set to an
> invalid value if the tuple isn't found. Makes the whole function
> entirely useless.
>
Yeah, kind of, but I think the same is noted in function comments and
caller is prepared to deal with it. See comments atop
heap_get_latest_tid (Note that it will not be changed if no version of
the row passes the snapshot test.). The caller (TidNext) will just
ignore such TID and move to next TID. I think you have a point that
one might have designed it differently by setting output value to
invalid value which would make caller to detect it easily. In short,
it's just a matter of choice whether we want to have Assert as Amul
has it in his patch or just leave. It should be okay either way.
>
>
>> > - should heap_get_latest_tid() error out when the chain ends in a moved
>> > tuple?
>>
>> Won't the same question applies to the similar usage in
>> EvalPlanQualFetch and heap_lock_updated_tuple_rec.
>
> I don't think so?
>
>
>> In EvalPlanQualFetch, we consider such a tuple to be deleted and will
>> silently miss/skip it which seems contradictory to the places where we
>> have detected such a situation and raised an error.
>
> if (ItemPointerIndicatesMovedPartitions(&hufd.ctid))
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> errmsg("tuple to be locked was already moved to another partition due to concurrent update")));
>
>
I was talking about the case when the tuple version is not visible aka
the below code:
/*
* If we get here, the tuple was found but failed SnapshotDirty.
..
*/
if (HeapTupleHeaderIndicatesMovedPartitions(tuple.t_data) ||
ItemPointerEquals(&tuple.t_self, &tuple.t_data->t_ctid))
{
/* deleted, so forget about it */
ReleaseBuffer(buffer);
return NULL;
}
Normally, if the tuple would have been updated such that it landed in
the same partition, then the chain would have continued, but now
because tuple is moved to another partition, we would end the chain
without letting the user know about it. Just see below the
repercussion of same.
>> In heap_lock_updated_tuple_rec, we will skip locking the versions of a
>> tuple after we encounter a tuple version that is moved to another
>> partition.
>
> I don't think that's true? We'll not lock *any* tuple in that case,
>
I think what will happen is that we will end up locking some versions
in the chain and then silently skip others. See below example:
Setup
-----------
postgres=# create table t1(c1 int, c2 varchar) partition by range(c1);
CREATE TABLE
postgres=# create table t1_part_1 partition of t1 for values from (1) to (100);
CREATE TABLE
postgres=# create table t1_part_2 partition of t1 for values from
(100) to (200);
CREATE TABLE
postgres=# insert into t1 values(1, 'aaa');
INSERT 0 1
Session-1
---------------
postgres=# begin;
BEGIN
postgres=# update t1 set c2='bbb' where c1=1;
UPDATE 1
postgres=# update t1 set c2='ccc' where c1=1;
UPDATE 1
postgres=# update t1 set c1=102 where c1=1;
UPDATE 1
Session-2
----------------
postgres=# begin;
BEGIN
postgres=# select * from t1 where c1=1 for key share;
Here, the Session-2 will lock one of the tuple versions and then wait
for Session-1 to end (as there is a conflicting update). Now, commit
the transaction in Session-1.
Session-1
---------------
Commit;
Now the Session-2 will skip the latest version of a tuple as it is
moved to another partition.
Session-2
----------------
postgres=# select * from t1 where c1=1 for key share;
c1 | c2
----+----
(0 rows)
The end result is that Session-2 will lock one of the versions of the
tuple and silently skipped locking latest version of the tuple. I
feel that is slightly confusing behavior with respect to the current
behavior or when tuple updates landed in the same partition.
I think if we return an error in EvalPlanQualFetch at the place
mentioned above, the behavior will be sane.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2018-04-06 04:31:58 | Re: [HACKERS] MERGE SQL Statement for PG11 |
Previous Message | John Naylor | 2018-04-06 04:08:30 | Re: WIP: a way forward on bootstrap data |