Re: StrategyGetBuffer optimization, take 2

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: StrategyGetBuffer optimization, take 2
Date: 2013-08-08 18:53:55
Message-ID: CAHyXU0yxyS_rmcRXwB0H2in5J7sMMOQYPvcrW-5_YxKsCRuXcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 7, 2013 at 11:52 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> -----Original Message-----
>> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
>> owner(at)postgresql(dot)org] On Behalf Of Merlin Moncure
>> Sent: Thursday, August 08, 2013 12:09 AM
>> To: Andres Freund
>> Cc: PostgreSQL-development; Jeff Janes
>> Subject: Re: [HACKERS] StrategyGetBuffer optimization, take 2
>>
>> On Wed, Aug 7, 2013 at 12:07 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
>> wrote:
>> > On 2013-08-07 09:40:24 -0500, Merlin Moncure wrote:
>> >> > I don't think the unlocked increment of nextVictimBuffer is a good
>> idea
>> >> > though. nextVictimBuffer jumping over NBuffers under concurrency
>> seems
>> >> > like a recipe for disaster to me. At the very, very least it will
>> need a
>> >> > good wad of comments explaining what it means and how you're
>> allowed to
>> >> > use it. The current way will lead to at least bgwriter accessing a
>> >> > nonexistant/out of bounds buffer via StrategySyncStart().
>> >> > Possibly it won't even save that much, it might just increase the
>> >> > contention on the buffer header spinlock's cacheline.
>> >>
>> >> I agree; at least then it's not unambiguously better. if you (in
>> >> effect) swap all contention on allocation from a lwlock to a
>> spinlock
>> >> it's not clear if you're improving things; it would have to be
>> proven
>> >> and I'm trying to keep things simple.
>> >
>> > I think converting it to a spinlock actually is a good idea, you just
>> > need to expand the scope a bit.
>>
>> all right: well, I'll work up another version doing full spinlock and
>> maybe see things shake out in performance.
>>
>> > FWIW, I am not convinced this is the trigger for the problems you're
>> > seing. It's a good idea nonetheless though.
>>
>> I have some very strong evidence that the problem is coming out of the
>> buffer allocator. Exhibit A is that vlad's presentation of the
>> problem was on a read only load (if not allocator lock, then what?).
>> Exhibit B is that lowering shared buffers to 2gb seems to have (so
>> far, 5 days in) fixed the issue. This problem shows up on fast
>> machines with fast storage and lots of cores. So what I think is
>> happening is that usage_count starts creeping up faster than it gets
>> cleared by the sweep with very large buffer settings which in turn
>> causes the 'problem' buffers to be analyzed for eviction more often.
>
> Yes one idea which was discussed previously is to not increase usage
> count, every time buffer is pinned.
> I am also working on some of the optimizations on similar area, which you
> can refer here:
>
> http://www.postgresql.org/message-id/006e01ce926c$c7768680$56639380$@kapila@
> huawei.com

yup -- just took a quick look at your proposed patch. You're
attacking the 'freelist' side of buffer allocation where my stripped
down patch addresses issues with the clocksweep. I think this is a
good idea but more than I wanted to get into personally.

Good news is that both patches should essentially bolt on together
AFAICT. I propose we do a bit of consolidation of performance testing
efforts and run tests with patch A, B, and AB in various scenarios. I
have a 16 core vm (4gb ram) that I can test with and want to start
with say 2gb database 1gb shared_buffers high concurrency test and see
how it burns in. What do you think? Are you at a point where we can
run some tests?

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-08-08 19:13:33 Re: Should we remove "not fast" promotion at all?
Previous Message Szymon Guz 2013-08-08 18:44:03 question about HTTP API