Re: Enough RAM for entire Database.. cost aside, is this

From: Mike Rylander <miker(at)purplefrog(dot)com>
To: Andy B <abhousehunt(at)blueyonder(dot)co(dot)uk>
Subject: Re: Enough RAM for entire Database.. cost aside, is this
Date: 2004-07-02 19:53:34
Message-ID: cc4fe6$vhk$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<posted & mailed>

Andy B wrote:

>
>> It's not that making the cache bigger is inefficient, it's that the cache
> is
>> not used the way you are thinking.
>
> Ok, I think I've got it now. The missing piece of the puzzle was the
> existence of the Linux buffer cache. So that's what the
> effective_buffer_cache value is for(!)
>
> I read Shridhar Daithankar and Josh Berkus's 'Tuning Postgresql for
> performance' document, and some things strike me (though only with respect
> to my RAM much bigger than database scenario.)
>
> I think I'm a bit more depressed than I was earlier today!
>
> 1. Postgresql is a two tiered cache mechanism. The first tier - the
> postgresql shared buffer cache sits on the second, larger tier, the linux
> buffer cache. So bits of the same data end up being in memory...twice, and
> two cache mechanisms operate at the same time. (That's how I understand
> it).
>
> 2. Even if the linux buffer cache contains all the data required for an
> execution of a plan, there is still a load of memory copying to do between
> these two tiers. Though memory copying is faster than disk access, it is
> still an overhead, and isn't there the real problem of thrashing between
> these two tiers if the plan can't fit all the data into the top tier, even
> if the thrashing is restricted to the memory system?

Perhaps I'm a bit naive about complex data structure caching strategies, but
it seems to me that the overhead of tracking tuples (which is what you
would want if you are going to manage your own cache, as opposed to simply
caching disk pages as the OS does) would be memory-prohibitive except in
the most extreme cases of RAM>DB SIZE.

>
> 3. The OS will implement a 'Least recently Used' cache replacement
> strategy on data in its cache. This isn't the optimal cache replacement
> strategy for the database. If postgresql is in charge of all its RAM
> resident data, it can make more intelligent decisions about which stuff
> isn't needed once used.
>

I find that experience does not bear this out. There is a saying a coworker
of mine has about apps that try to solve problems, in this case caching,
that are well understood and generally handled well at other levels of the
"software stack"... he calls them "too smart by half" :)

> This still leads me to think that the best thing to do in my 'much bigger
> RAM than database size' scenario would be for postgresql to allocate a
> shared buffer cache big enough for all the data + a bit.
>

The problem this this assumption, as Tom Lane has said, is that generally
speaking, a kernel is much faster with respect to cache *lookup* than
postgres is. Postgres assumes that only a subset of the data stored in a
cluster will be needed at any one time. Under this assumption (that it
will need to store and shovel through thousands, instead of millions or
more, tuples) some optimizations have been made that would cause
performance degradation if used over the entire (potentially enormous)
dataset.

> By doing this, there would be some performance enhancing gains, including:
>
> 1. If the planner *knew* (rather than guessed) that all the data was
> effectively 'there' in RAM, in the right place (i.e. its workspace),
> wouldn't it make choosing the optimal plan easier? (I see that
> effective_buffer_cache is a stab in that direction, but just because
> postgresql guesses the data will have been cached by linux, it doesn't
> mean it actually is - surely an important distinction.)
>
> 2. You'd avoid a whole layer of caching, along with the not necessarily
> aligned memory copies and other overheads that this introduces. Even in
> the optimal case where all the data needed does reside in RAM, it's not in
> the right bit of RAM. (I may have misunderstood this relationship between
> the shared buffer cache and the linux buffer cache - if no memory copying
> actually occurs - then I'll go away for a bit!)
>
> Two interesting things I dug up today:
>
> www.linuxjournal.com/article.php?sid=5482 (to do with a high performance
> DB living in an OS controlled environment)
>

Come now, Nov. 2001? Most of the issues brought up here have been solved
*in the kernel* since then. (In the general case, with the bevy of IO
schedulers, and the interuptable kernel.)

> and
>
> http://www.ixora.com.au/tips/avoid_buffered_io.htm (to do with Oracle
> tuning)

Well, Oracle isn't PG, and I can't speak to their buffering and caching
implementation.

>
> The way things are, I see that postgresql will 'gracefully degrade' in the
> very common scenario where the database shares a machine with limited
> resources and lots of processes, all of them hammering the machine, but in
> my 'optimal RAM optimized db server ' scenario, the scheme seems actually
> to reduce the potential for a blistering system.
>
> So given all of that, I can't help swinging back to my original question
> about whether it's a bad thing to make the shared buffer cache huge, so
> long as you have much more RAM than the size you choose.

See above...

>
> It'll still require all that nasty Linux cache <->postgresql cache memory
> copying, but it will at least reduce postgresql's exposure to the 'not
> particularly clever' least recently used cache replacement strategy
> employed by the OS.

<rant> EVIDENCE, PLEASE </rant>

>
> Am I getting any closer to understanding the way things are?
>
> Thanks for your tolerance,
> Andy
>
>
> p.s.
>
> (It would be great if you could just load the data directly into the
> shared buffer cache with direct IO, since that OS file buffer->shared
> buffer cache transfer is just a waste of effort in this scenario), and

<rant> again: EVIDENCE, PLEASE </rant>

> direct IO allows spectacular IO gains when you're working with certain
> raid configurations connected with a suitable fat wire.)
>

select count(*) from table;

Put that in a loop inside a function, select the table names from
INFORMATION_SCHEMA.TABLES.

> Ok - I'll shutup now

We (well, I) don't want you to shut up. Infact I enjoy discussions like
this, because it freshens my memory on just why PG rocks so hard! ;)

--miker

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Eisentraut 2004-07-02 19:59:31 Re: per-session (or persistent) table (and column) aliases
Previous Message David Garamond 2004-07-02 19:30:48 per-session (or persistent) table (and column) aliases