Re: Scaling shared buffer eviction

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling shared buffer eviction
Date: 2014-09-03 14:33:19
Message-ID: CA+Tgmobm8bPcuGPFEtY1tn+_R+CqzF8qN3wHqQmUKb+upU97TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 3, 2014 at 7:27 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> +Background Reclaimer's Processing
>> +---------------------------------
>>
>> I suggest titling this section "Background Reclaim".
>
> I don't mind changing it, but currently used title is based on similar
> title "Background Writer's Processing". It is used in previous
> paragraph. Is there a reason to title this differently?

Oh, I didn't see that. Seems like weird phrasing to me, but I guess
it's probably better to keep it consistent.

>> +The background reclaimer is designed to move buffers to freelist that are
>>
>> I suggest replacing the first three words of this sentence with
>> "bgreclaimer".
>
> Again what I have used is matching with BgWriter's explanation. I thought
> it would be better if wording used is similar.

OK.

>> + while (tmp_num_to_free > 0)
>>
>> I am not sure it's a good idea for this value to be fixed at loop
>> start and then just decremented.
>
> It is based on the idea what bgwriter does for num_to_scan and
> calling it once has advantage that we need to take freelist_lck
> just once.

Right, we shouldn't call it every loop iteration. However, consider
this scenario: there are no remaining buffers on the list and the high
watermark is 2000. We add 2000 buffers to the list. But by the time
we get done, other backends have already done 500 more allocations, so
now there are only 1500 buffers on the list. If this should occur, we
should add an additional 500 buffers to the list before we consider
sleeping. We want bgreclaimer to be able to run continuously if the
demand for buffers is high enough.

>> In freelist.c, it seems like a poor idea to have two spinlocks as
>> consecutive structure members; they'll be in the same cache line,
>> leading to false sharing. If we merge them into a single spinlock,
>> does that hurt performance?
>
> I have kept them separate so that backends searching for a buffer
> in freelist doesn't contend with bgreclaimer (while doing clock sweep)
> or clock sweep being done by other backends. I think it will be bit
> tricky to devise a test where this can hurt, however it doesn't seem
> too bad to have two separate locks in this case.

It's not. But if they are in the same cache line, they will behave
almost like one lock, because the CPU will lock the entire cache line
for each atomic op. See Tom's comments upthread.

> Okay, but this patch hasn't changed anything w.r.t above comment,
> so I haven't changed it. Do you want me to remove second part of
> comment starting with "(This can only happen"?

Right. Clearly it can happen again once we have this patch: that's
the entire point of the patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-09-03 14:39:22 Re: Audit of logout
Previous Message Jan Wieck 2014-09-03 14:24:05 Re: PL/pgSQL 2