Re: Clock sweep not caching enough B-Tree leaf pages?

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clock sweep not caching enough B-Tree leaf pages?
Date: 2014-04-17 14:48:15
Message-ID: 20140417144814.GB7443@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
> >> I also believe this to be the case on first principles and my own
> >> experiments. Suppose you have a workload that fits inside
> >> shared_buffers. All of the usage counts will converge to 5. Then,
> >> somebody accesses a table that is not cached, so something's got to be
> >> evicted. Because all the usage counts are the same, the eviction at
> >> this point is completely indiscriminate. We're just as likely to kick
> >> out a btree root page or a visibility map page as we are to kick out a
> >> random heap page, even though the former have probably been accessed
> >> several orders of magnitude more often. That's clearly bad. On
> >> systems that are not too heavily loaded it doesn't matter too much
> >> because we just fault the page right back in from the OS pagecache.
> >> But I've done pgbench runs where such decisions lead to long stalls,
> >> because the page has to be brought back in from disk, and there's a
> >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> >> issuing too many I/O requests and makes some of them wait to cool
> >> things down.
> >
> > I understand now. If there is no memory pressure, every buffer gets the
> > max usage count, and when a new buffer comes in, it isn't the max so it
> > is swiftly removed until the clock sweep has time to decrement the old
> > buffers. Decaying buffers when there is no memory pressure creates
> > additional overhead and gets into timing issues of when to decay.
>
> That can happen, but the real problem I was trying to get at is that
> when all the buffers get up to max usage count, they all appear
> equally important. But in reality they're not. So when we do start
> evicting those long-resident buffers, it's essentially random which
> one we kick out.

True. Ideally we would have some way to know that _all_ the buffers had
reached the maximum and kick off a sweep to decrement them all. I am
unclear how we would do that. One odd idea would be to have a global
counter that is incremented everytime a buffer goes from 4 to 5 (max)
--- when the counter equals 50% of all buffers, do a clock sweep. Of
course, then the counter becomes a bottleneck.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-04-17 14:55:30 Re: Clock sweep not caching enough B-Tree leaf pages?
Previous Message Robert Haas 2014-04-17 14:40:40 Re: Clock sweep not caching enough B-Tree leaf pages?