Quick Links

Re: Page replacement algorithm in buffer cache

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Page replacement algorithm in buffer cache
Date:	2013-04-09 03:58:13
Message-ID:	CA+TgmoYhWsz__KtSxm6BuBirE7VR6Qqc_COkbEZTQpk8oom3CA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> I still have one more doubt, consider the below scenario for cases when we
> Invalidate buffers during moving to freelist v/s just move to freelist
>
> Backend got the buffer from freelist for a request of page-9 (number 9 is
> random, just to explain), it still have association with another page-10
> It needs to add the buffer with new tag (new page association) in bufhash
> table and remove the buffer with oldTag (old page association).
>
> The benefit for just moving to freelist is that if we get request of same
> page until somebody else used it for another page, it will save read I/O.
> However on the other side for many cases
> Backend will need extra partition lock to remove oldTag (which can lead to
> some bottleneck).
>
> I think saving read I/O is more beneficial but just not sure if that is best
> as cases might be less for it.

I think saving read I/O is a lot more beneficial. I haven't seen
evidence of a severe bottleneck updating the buffer mapping tables. I
have seen some evidence of spinlock-level contention on read workloads
that fit in shared buffers, because in that case the system can run
fast enough for the spinlocks protecting the lwlocks to get pretty
hot. But if you're doing writes, or if the workload doesn't fit in
shared buffers, other bottlenecks slow you down enough that this
doesn't really seem to become much of an issue.

Also, even if you *can* find some scenario where pushing the buffer
invalidation into the background is a win, I'm not convinced that
would justify doing it, because the case where it's a huge loss -
namely, working set just a tiny bit smaller than shared_buffers - is
pretty obvious. I don't think we dare fool around with that; the
townspeople will arrive with pitchforks.

I believe that the big win here is getting the clock sweep out of the
foreground so that BufFreelistLock doesn't catch fire. The buffer
mapping locks are partitioned and, while it's not like that completely
gets rid of the contention, it sure does help a lot. So I would view
that goal as primary, at least for now. If we get a first round of
optimization done in this area, that doesn't preclude further
improving it in the future.

> Last time following tests have been executed to validate the results:
>
> Test suite - pgbench
> DB Size - 16 GB
> RAM - 24 GB
> Shared Buffers - 2G, 5G, 7G, 10G
> Concurrency - 8, 16, 32, 64 clients
> Pre-warm the buffers before start of test
>
> Shall we try for any other scenario's or for initial test of patch above are
> okay.

Seems like a reasonable place to start.

...Robert

In response to

Re: Page replacement algorithm in buffer cache at 2013-04-05 19:08:10 from Robert Haas

Responses

Re: Page replacement algorithm in buffer cache at 2013-04-09 07:07:03 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2013-04-09 05:15:06	Re: WIP: index support for regexp search
Previous Message	Ants Aasma	2013-04-09 02:35:16	Re: Enabling Checksums