Re: Faster inserts with mostly-monotonically increasing values

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-14 15:05:45
Message-ID: CAGTBQpbZVR2HjdwpxroeocusL5+n=Hn0JbzJGbB8LPtGKxQpNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 14, 2018 at 1:36 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> On Sun, Mar 11, 2018 at 9:18 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>
> wrote:
>>
>> On Sun, Mar 11, 2018 at 2:27 AM, Pavan Deolasee
>>
>> >
>> > Yes, I will try that next - it seems like a good idea. So the idea would
>> > be:
>> > check if the block is still the rightmost block and the insertion-key is
>> > greater than the first key in the page. If those conditions are
>> > satisfied,
>> > then we do a regular binary search within the page to find the correct
>> > location. This might add an overhead of binary search when keys are
>> > strictly
>> > ordered and a single client is inserting the data. If that becomes a
>> > concern, we might be able to look for that special case too and optimise
>> > for
>> > it too.
>>
>> Yeah, pretty much that's the idea. Beware, if the new item doesn't
>> fall in the rightmost place, you still need to check for serialization
>> conflicts.
>
>
> So I've been toying with this idea since yesterday and I am quite puzzled
> with the results. See the attached patch which compares the insertion key
> with the last key inserted by this backend, if the cached block is still the
> rightmost block in the tree. I initially only compared with the first key in
> the page, but I tried this version because of the strange performance
> regression which I still have no answers.
>
> For a small number of clients, the patched version does better. But as the
> number of clients go up, the patched version significantly underperforms
> master. I roughly counted the number of times the fastpath is taken and I
> noticed that almost 98% inserts take the fastpath. I first thought that the
> "firstkey" location in the page might be becoming a hot-spot for concurrent
> processes and hence changed that to track the per-backend last offset and
> compare against that the next time. But that did not help much.

+ _bt_compare(rel, natts, itup_scankey, page,
+ RelationGetLastOffset(rel)) >= 0)

Won't this access garbage if the last offset is stale and beyond the
current rightmost page's last offset?

I'd suggest simply using P_FIRSTDATAKEY after checking that the page
isn't empty (see _bt_binsrch).

On Wed, Mar 14, 2018 at 11:12 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> > Hmm. I can try that. It's quite puzzling though that slowing down things
>> > actually make them better.
>>
>> That's not what is happening though.
>>
>> The cache path is 1) try to lock cached block, 2) when got lock check
>> relevance, if stale 3) recheck from top
>>
>> The non-cached path is just 3) recheck from top
>>
>> The overall path length is longer in the cached case but provides
>> benefit if we can skip step 3 in high % of cases. The non-cached path
>> is more flexible because it locates the correct RHS block, even if it
>> changes dynamically between starting the recheck from top.
>>
>
> So as I noted in one of the previous emails, the revised patch still takes
> fast path in 98% cases. So it's not clear why the taking steps 1, 2 and 3 in
> just 2% cases should cause such dramatic slowdown.

Real-world workloads will probably take the slow path more often, so
it's probably worth keeping the failure path as contention-free as
possible.

Besides, even though it may be just 2% the times it lands there, it
could still block for a considerable amount of time for no benefit.

So I guess a conditional lock is not a bad idea.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-03-14 15:06:41 Re: Failed to request an autovacuum work-item in silence
Previous Message Jeevan Chalke 2018-03-14 14:21:16 Re: [HACKERS] Partition-wise aggregation/grouping