Quick Links

Re: Page replacement algorithm in buffer cache

From:	Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To:	"'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc:	"'Greg Smith'" <greg(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Page replacement algorithm in buffer cache
Date:	2013-04-09 07:07:03
Message-ID:	006f01ce34f0$d6fa8220$84ef8660$@kapila@huawei.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> -----Original Message-----
> From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> Sent: Tuesday, April 09, 2013 9:28 AM
> To: Amit Kapila
> Cc: Greg Smith; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Page replacement algorithm in buffer cache
>
> On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
> wrote:
> > I still have one more doubt, consider the below scenario for cases
> when we
> > Invalidate buffers during moving to freelist v/s just move to
> freelist
> >
> > Backend got the buffer from freelist for a request of page-9
> (number 9 is
> > random, just to explain), it still have association with another
> page-10
> > It needs to add the buffer with new tag (new page association) in
> bufhash
> > table and remove the buffer with oldTag (old page association).
> >
> > The benefit for just moving to freelist is that if we get request of
> same
> > page until somebody else used it for another page, it will save read
> I/O.
> > However on the other side for many cases
> > Backend will need extra partition lock to remove oldTag (which can
> lead to
> > some bottleneck).
> >
> > I think saving read I/O is more beneficial but just not sure if that
> is best
> > as cases might be less for it.
>
> I think saving read I/O is a lot more beneficial. I haven't seen
> evidence of a severe bottleneck updating the buffer mapping tables. I
> have seen some evidence of spinlock-level contention on read workloads
> that fit in shared buffers, because in that case the system can run
> fast enough for the spinlocks protecting the lwlocks to get pretty
> hot. But if you're doing writes, or if the workload doesn't fit in
> shared buffers, other bottlenecks slow you down enough that this
> doesn't really seem to become much of an issue.
>
> Also, even if you *can* find some scenario where pushing the buffer
> invalidation into the background is a win, I'm not convinced that
> would justify doing it, because the case where it's a huge loss -
> namely, working set just a tiny bit smaller than shared_buffers - is
> pretty obvious. I don't think we dare fool around with that; the
> townspeople will arrive with pitchforks.
>
> I believe that the big win here is getting the clock sweep out of the
> foreground so that BufFreelistLock doesn't catch fire. The buffer
> mapping locks are partitioned and, while it's not like that completely
> gets rid of the contention, it sure does help a lot. So I would view
> that goal as primary, at least for now. If we get a first round of
> optimization done in this area, that doesn't preclude further
> improving it in the future.

I agree with you that this can be first step towards improvement.

> > Last time following tests have been executed to validate the results:
> >
> > Test suite - pgbench
> > DB Size - 16 GB
> > RAM - 24 GB
> > Shared Buffers - 2G, 5G, 7G, 10G
> > Concurrency - 8, 16, 32, 64 clients
> > Pre-warm the buffers before start of test
> >
> > Shall we try for any other scenario's or for initial test of patch
> above are
> > okay.
>
> Seems like a reasonable place to start.

I shall work on this for first CF of 9.4.

With Regards,
Amit Kapila.

In response to

Re: Page replacement algorithm in buffer cache at 2013-04-09 03:58:13 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ants Aasma	2013-04-09 07:20:06	Re: Enabling Checksums
Previous Message	Simon Riggs	2013-04-09 07:03:59	Re: Enabling Checksums