From: | feichanghong <feichanghong(at)qq(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, hu_yajun(at)qq(dot)com |
Subject: | Re: Improve the efficiency of _bt_killitems. |
Date: | 2024-11-03 11:56:43 |
Message-ID: | tencent_890303415489ECBD2223A1F3E7C3D6A48405@qq.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Nov 1, 2024, at 18:50, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 01/11/2024 10:41, feichanghong wrote:
>>> On Nov 1, 2024, at 16:24, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>>
>>> On 01/11/2024 09:19, feichanghong wrote:
>>>> Hi hackers,
>>>> In the _bt_killitems function, the following logic is present: we search to the right for an index item that matches the heap TID and attempt to mark it as dead. If that index item has already been marked as dead by other concurrent processes, we will continue
>>>> searching. However, there should not be any more matching index items
>>>> on the current page.
>>> Why could there not be more matching items on the page?
>>>
>>> Are you assuming a unique index? Even then it's not right; you can have multiple index entries point to different heap tuples with the same key, as long as they're not visible at the same time. For example, if you UPDATE or DELETE+INSERT a row.
>> Maybe I didn't describe it clearly. What I meant to say is that there shouldn't
>> be multiple index items on the current page pointing to the same heap TID(ctid).
>> rather than the same index key.
>
> Ah, gotcha. That makes sense.
>
> There's more we could do here in the similar vein:
>
> If the TID was found in a posting list, but not all of the items were dead, 'killtuple' is set to false so we will still continue the search.
>
> Then there's this comment:
>
>> /*
>> * Don't bother advancing the outermost loop's int iterator to
>> * avoid processing killed items that relate to the same
>> * offnum/posting list tuple. This micro-optimization hardly
>> * seems worth it. (Further iterations of the outermost loop
>> * will fail to match on this same posting list's first heap
>> * TID instead, so we'll advance to the next offnum/index
>> * tuple pretty quickly.)
>> */
>
> I think we should do that micro-optimization. It's a single line of code and a single instruction to advance 'i' variable AFAICS. While I agree with the comment that it doesn't matter much performance-wise, it seems simpler to just do it than explain why we don't do it.
Yes, I agree that this micro-optimization is not significant for performance,
but it can still save some CPU overhead and make the code easier to understand.
In the test case below (I measured the execution time of the _bt_killitems
function in the code), the execution time of the _bt_killitems function before
optimization was 0.773 ms, while the time after optimization was 0.073 ms:
```
create table t(a int);
insert into t select 1 from generate_series(1, 2000);
create index on t(a);
delete from t ;
explain analyze select * from t order by a;
```
As you mentioned, there are two points of optimization in _bt_killitems:
1. Skipping duplicate items in the posting tuple to avoid unnecessary scans and
checks. For example, if the tids of a posting index tuple are
{"(0,1)","(0,2)","(0,3)","(0,4)","(0,5)"}, it will add 5 entries to killedItems,
and their indexOffset would be the same. The checks for the latter 4 items are
meaningless.
2. If we have already matched the index of the specified head TID, there is no
need to continue searching further.
Maybe we can make the following optimization:
```
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 1b75066fb51..d44134ac70a 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -4174,6 +4174,7 @@ _bt_killitems(IndexScanDesc scan)
BTPageOpaque opaque;
OffsetNumber minoff;
OffsetNumber maxoff;
+ OffsetNumber preoff = InvalidOffsetNumber;
int i;
int numKilled = so->numKilled;
bool killedsomething = false;
@@ -4229,6 +4230,10 @@ _bt_killitems(IndexScanDesc scan)
BTScanPosItem *kitem = &so->currPos.items[itemIndex];
OffsetNumber offnum = kitem->indexOffset;
+ if (offnum == preoff)
+ continue; /* skip duplicates */
+ preoff = offnum;
+
Assert(itemIndex >= so->currPos.firstItem &&
itemIndex <= so->currPos.lastItem);
if (offnum < minoff)
@@ -4238,6 +4243,7 @@ _bt_killitems(IndexScanDesc scan)
ItemId iid = PageGetItemId(page, offnum);
IndexTuple ituple = (IndexTuple) PageGetItem(page, iid);
bool killtuple = false;
+ bool headTidMatched = false;
if (BTreeTupleIsPosting(ituple))
{
@@ -4266,6 +4272,8 @@ _bt_killitems(IndexScanDesc scan)
if (!ItemPointerEquals(item, &kitem->heapTid))
break; /* out of posting list loop */
+ headTidMatched = true;
+
/*
* kitem must have matching offnum when heap TIDs match,
* though only in the common case where the page can't
@@ -4292,19 +4300,19 @@ _bt_killitems(IndexScanDesc scan)
}
/*
- * Don't bother advancing the outermost loop's int iterator to
- * avoid processing killed items that relate to the same
- * offnum/posting list tuple. This micro-optimization hardly
- * seems worth it. (Further iterations of the outermost loop
- * will fail to match on this same posting list's first heap
- * TID instead, so we'll advance to the next offnum/index
- * tuple pretty quickly.)
+ * We don't directly advance the outermost loop's int iterator
+ * here to avoid processing killed items related to the same
+ * offnum or posting list tuple. This micro-optimization will
+ * be completed at the beginning of the outermost loop.
*/
if (j == nposting)
killtuple = true;
}
else if (ItemPointerEquals(&ituple->t_tid, &kitem->heapTid))
+ {
killtuple = true;
+ headTidMatched = true;
+ }
/*
* Mark index item as dead, if it isn't already. Since this
@@ -4321,6 +4329,15 @@ _bt_killitems(IndexScanDesc scan)
killedsomething = true;
break; /* out of inner search loop */
}
+
+ /*
+ * If an index entry has already been matched with a heap TID, we
+ * do not search further because there should not be index entries
+ * pointing to the same heap TID in the index.
+ */
+ if (headTidMatched)
+ break;
+
offnum = OffsetNumberNext(offnum);
}
}
```
Best Regards,
Fei Changhong
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-11-03 13:27:58 | Re: Pgoutput not capturing the generated columns |
Previous Message | bigbro_wq@hotmail.com | 2024-11-03 06:18:47 | Repeat the condition check twice in function distribute_qual_to_rels |