Re: Page replacement algorithm in buffer cache

From: Jim Nasby <jim(at)nasby(dot)net>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page replacement algorithm in buffer cache
Date: 2013-04-01 22:56:19
Message-ID: 515A1093.8090403@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/23/13 7:41 AM, Ants Aasma wrote:
> On Sat, Mar 23, 2013 at 6:04 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> Partitioned clock sweep strikes me as a bad idea... you could certainly get
>> unlucky and end up with a lot of hot stuff in one partition.
>
> Surely that is not worse than having everything in a single partition.
> Given a decent partitioning function it's very highly unlikely to have
> more than a few of the hottest buffers end up in a single partition.

One could argue that it is worse because you've added another layer of unpredictability to performance. If something happens to suddenly put two heavily hit sets in the same partition your previously good performance suddenly tanks.

Maybe that issue isn't real enough to be worth worrying about, but I still think it'd be easier and cleaner to try keeping stuff on the free list first...

>> Another idea that'sbeen broughht up inthe past is to have something in the
>> background keep a minimum number of buffers on the free list. That's how OS
>> VM systems I'm familiar with work, so there's precedent for it.
>>
>> I recall there were at least some theoretical concerns about this, but I
>> don't remember if anyone actually tested the idea.
>
> Yes, having bgwriter do the actual cleaning up seems like a good idea.
> The whole bgwriter infrastructure will need some serious tuning. There
> are many things that could be shifted to background if we knew it
> could keep up, like hint bit setting on dirty buffers being flushed
> out. But again, we have the issue of having good tests to see where
> the changes hurt.

I think at some point we need to stop depending on just bgwriter for all this stuff. I believe it would be much cleaner if we had separate procs for everything we needed (although some synergies might exist; if we wanted to set hint bits during write then bgwriter *is* the logical place to put that).

In this case, I don't think keeping stuff on the free list is close enough to checkpoints that we'd want bgwriter to handle both. At most we might want them to pass some metrics back in forth.
--
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2013-04-01 23:09:07 Re: Page replacement algorithm in buffer cache
Previous Message Brendan Jurd 2013-04-01 22:40:38 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)