From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Page replacement algorithm in buffer cache |
Date: | 2013-04-09 03:58:13 |
Message-ID: | CA+TgmoYhWsz__KtSxm6BuBirE7VR6Qqc_COkbEZTQpk8oom3CA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Apr 5, 2013 at 11:08 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> I still have one more doubt, consider the below scenario for cases when we
> Invalidate buffers during moving to freelist v/s just move to freelist
>
> Backend got the buffer from freelist for a request of page-9 (number 9 is
> random, just to explain), it still have association with another page-10
> It needs to add the buffer with new tag (new page association) in bufhash
> table and remove the buffer with oldTag (old page association).
>
> The benefit for just moving to freelist is that if we get request of same
> page until somebody else used it for another page, it will save read I/O.
> However on the other side for many cases
> Backend will need extra partition lock to remove oldTag (which can lead to
> some bottleneck).
>
> I think saving read I/O is more beneficial but just not sure if that is best
> as cases might be less for it.
I think saving read I/O is a lot more beneficial. I haven't seen
evidence of a severe bottleneck updating the buffer mapping tables. I
have seen some evidence of spinlock-level contention on read workloads
that fit in shared buffers, because in that case the system can run
fast enough for the spinlocks protecting the lwlocks to get pretty
hot. But if you're doing writes, or if the workload doesn't fit in
shared buffers, other bottlenecks slow you down enough that this
doesn't really seem to become much of an issue.
Also, even if you *can* find some scenario where pushing the buffer
invalidation into the background is a win, I'm not convinced that
would justify doing it, because the case where it's a huge loss -
namely, working set just a tiny bit smaller than shared_buffers - is
pretty obvious. I don't think we dare fool around with that; the
townspeople will arrive with pitchforks.
I believe that the big win here is getting the clock sweep out of the
foreground so that BufFreelistLock doesn't catch fire. The buffer
mapping locks are partitioned and, while it's not like that completely
gets rid of the contention, it sure does help a lot. So I would view
that goal as primary, at least for now. If we get a first round of
optimization done in this area, that doesn't preclude further
improving it in the future.
> Last time following tests have been executed to validate the results:
>
> Test suite - pgbench
> DB Size - 16 GB
> RAM - 24 GB
> Shared Buffers - 2G, 5G, 7G, 10G
> Concurrency - 8, 16, 32, 64 clients
> Pre-warm the buffers before start of test
>
> Shall we try for any other scenario's or for initial test of patch above are
> okay.
Seems like a reasonable place to start.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-04-09 05:15:06 | Re: WIP: index support for regexp search |
Previous Message | Ants Aasma | 2013-04-09 02:35:16 | Re: Enabling Checksums |