From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile |
Date: | 2012-05-31 19:25:28 |
Message-ID: | CAHyXU0wH-2L=DHOXQmiDHBGRXxODCnQCHXB9rxcU1ju-mqksFQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 31, 2012 at 1:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 31, 2012 at 2:03 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Thu, May 31, 2012 at 11:54 AM, Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk> wrote:
>>> On Thu, 31 May 2012, Robert Haas wrote:
>>>
>>>> Oh, ho. So from this we can see that the problem is that we're
>>>> getting huge amounts of spinlock contention when pinning and unpinning
>>>> index pages.
>>>>
>>>> It would be nice to have a self-contained reproducible test case for
>>>> this, so that we could experiment with it on other systems.
>>>
>>>
>>> I have created it a few days ago:
>>> http://archives.postgresql.org/pgsql-hackers/2012-05/msg01143.php
>>>
>>> It is still valid. And I'm using exactly it to test. The only thing to
>>> change is to create a two-col index and drop another index.
>>> The scripts are precisely the ones I'm using now.
>>>
>>> The problem is that in order to see a really big slowdown (10 times slower
>>> than a single thread) I've had to raise the buffers to 48g but it was slow
>>> for smaller shared buffer settings as well.
>>>
>>> But I'm not sure how sensitive the test is to the hardware.
>>
>> It's not: high contention on spinlocks is going to suck no matter what
>> hardware you have. I think the problem is pretty obvious now: any
>> case where multiple backends are scanning the same sequence of buffers
>> in a very tight loop is going to display this behavior. It doesn't
>> come up that often: it takes a pretty unusual sequence of events to
>> get a bunch of backends hitting the same buffer like that.
>>
>> Hm, I wonder if you could alleviate the symptoms by making making the
>> Pin/UnpinBuffer smarter so that frequently pinned buffers could stay
>> pinned longer -- kinda as if your private ref count was hacked to be
>> higher in that case. It would be a complex fix for a narrow issue
>> though.
>
> This test case is unusual because it hits a whole series of buffers
> very hard. However, there are other cases where this happens on a
> single buffer that is just very, very hot, like the root block of a
> btree index, where the pin/unpin overhead hurts us. I've been
> thinking about this problem for a while, but it hasn't made it up to
> the top of my priority list, because workloads where pin/unpin is the
> dominant cost are still relatively uncommon. I expect them to get
> more common as we fix other problems.
>
> Anyhow, I do have some vague thoughts on how to fix this. Buffer pins
> are a lot like weak relation locks, in that they are a type of lock
> that is taken frequently, but rarely conflicts. And the fast-path
> locking in 9.2 provides a demonstration of how to handle this kind of
> problem efficiently: making the weak, rarely-conflicting locks
> cheaper, at the cost of some additional expense when a conflicting
> lock (in this case, a buffer cleanup lock) is taken. In particular,
> each backend has its own area to record weak relation locks, and a
> strong relation lock must scan all of those areas and migrate any
> locks found there to the main lock table. I don't think it would be
> feasible to adopt exactly this solution for buffer pins, because page
> eviction and buffer cleanup locks, while not exactly common, are
> common enough that we can't require a scan of N per-backend areas
> every time one of those operations occurs.
>
> But, maybe we could have a system of this type that only applies to
> the very hottest buffers. Suppose we introduce two new buffer flags,
> BUF_NAILED and BUF_NAIL_REMOVAL. When we detect excessive contention
> on the buffer header spinlock, we set BUF_NAILED. Once we do that,
> the buffer can't be evicted until that flag is removed, and backends
> are permitted to record pins in a per-backend area protected by a
> per-backend spinlock or lwlock, rather than in the buffer header.
> When we want to un-nail the buffer, we set BUF_NAIL_REMOVAL.
Hm, couple questions: how do you determine if/when to un-nail a
buffer, and who makes that decision (bgwriter?) Is there a limit to
how many buffers you are allowed to nail? It seems like a much
stronger idea, but one downside I see vs the 'pin for longer idea' i
was kicking around was how to deal stale nailed buffers and keeping
them from uncontrollably growing so that you have to either stop
nailing or forcibly evicting them.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2012-05-31 19:49:11 | Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers. |
Previous Message | Jeff Davis | 2012-05-31 19:17:50 | Re: extending relations more efficiently |