Re: Enough RAM for entire Database.. cost aside, is this

From: "Andy B" <abhousehuntRE-M--O--V-E(at)blueyonder(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Enough RAM for entire Database.. cost aside, is this
Date: 2004-07-02 23:42:34
Message-ID: KpmFc.11407$HQ1.9258@fe2.news.blueyonder.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello again Mike,

Thanks for the replies! Here's my next salvo!

> Perhaps I'm a bit naive about complex data structure caching strategies,
but
> it seems to me that the overhead of tracking tuples (which is what you
> would want if you are going to manage your own cache, as opposed to simply
> caching disk pages as the OS does) would be memory-prohibitive except in
> the most extreme cases of RAM>DB SIZE.

But doesn't postgresql already make a decision about what needs to be in its
workspace prior to executing a plan? I'm not actually suggesting it do
anything different. In fact, the only thing I'm suggesting is that the
workspace need not be limited in size.

Remember - I'm only talking about the scenario where the entire database -
every single tuple - can reside in the workspace (the shared buffer cache)
at the same time. There is no extra 'managing your own cache' as you put it.
Just let postgresql's postmaster do what it does. (assuming it doesn't kick
stuff out of its shared buffer cache unless it needs the space for something
else... ok -that may be an incorrect assumption - I need to find out for
sure)

But I think the overall effort is less, not more. Once in the shared buffer
cache, a tuple never needs to be kicked out again to make way for another.
That's what is so nice. Everything is just there... No need to calculate a
least damaging cache replacement strategy, no need to wait for blocks of
data to work their way from the disk or linux buffer cache to the postgresql
workspace -it all ends up there, and stays there, with only syncing activity
needing to take place when updates happen.

>
> I find that experience does not bear this out. There is a saying a
coworker
> of mine has about apps that try to solve problems, in this case caching,
> that are well understood and generally handled well at other levels of the
> "software stack"... he calls them "too smart by half" :)

Well, I guess I'd agree and disagree with your friend. What a political
animal I am!

Here's why.

Imagine this scenario. Your application uses 5 bits of data - A B C D and E.
Only 3 can sit in memory at once.

You want to do two transactions. The first combines A, B and C, so you load
those, in order.

Once you're finished with that first transaction, you then require to
combine A, C and D.

Now, an application driven cache will see immediately that the optimal cache
replacement is to load D into the slot taken by B. With the OS in charge,
you'll probably end up with a cache thrash.

This is simplistic, but it is just this sort of smart caching that can lead
to dramatic speedups, and I've got quite a bit of experience of seeing just
that.

I don't want to devalue the linux cache strategy. It is very appropriate for
almost all situations. I'm talking about a very specific scenario here
though.

Does that make sense?

> The problem this this assumption, as Tom Lane has said, is that generally
> speaking, a kernel is much faster with respect to cache *lookup* than
> postgres is. Postgres assumes that only a subset of the data stored in a
> cluster will be needed at any one time. Under this assumption (that it
> will need to store and shovel through thousands, instead of millions or
> more, tuples) some optimizations have been made that would cause
> performance degradation if used over the entire (potentially enormous)
> dataset.
>

Again, in my scenario, postgres is *relieved* of the task of having to
decide what data to choose to be in the workspace, since ALL of the data is
accessible there in my scenario.

The point about postgres having to shovel through millions of tuples is a
different point. If I do a query which can only be executed by looking at a
million tuples, then that is what has to happen, no matter what caching
strategy you employ. Obviously, if you're shovelling data in and out of
memory, then the *way* you traverse those tuples becomes important to
efficiency, as loading sequential stuff off a disk is always faster. If it's
*all* in the workspace, well it doesn't matter at all then - the planner can
forget about disks and choose the fastest algorithmic option.

Again - this is a simplification, but I hope you can see what I'm getting
at.

> Come now, Nov. 2001? Most of the issues brought up here have been solved
> *in the kernel* since then. (In the general case, with the bevy of IO
> schedulers, and the interuptable kernel.)

The point is that if your data is sitting in two parts of memory at the same
time, then that's a waste of memory, and it's extra time copying all that
stuff around. The date of the article isn't relevant to this, I don't think.

They make the point that no matter how efficient the OS kernel is, in the
sort of scenario I am interested in, it just ends up being an unnecessary
step.
>
> Well, Oracle isn't PG, and I can't speak to their buffering and caching
> implementation.

I'm just saying that these issues are generic issues for any high
performance database server, regardless of whether it is Oracle, Postgresql,
Mysql etc. The things they highlight in that article are the very things I
allude to here for my special scenario. Even if you're a postgresql fan (and
I chose to use it too), it is worth poking your head over the parapet and
seeing how others are dealing with these issues too.

> >
> > (It would be great if you could just load the data directly into the
> > shared buffer cache with direct IO, since that OS file buffer->shared
> > buffer cache transfer is just a waste of effort in this scenario), and
>
> <rant> again: EVIDENCE, PLEASE </rant>

In this case, I can speak with a little authority (for once) : this is just
best practice for a high performance system. If you want an application to
get at its data quickly, you use block aligned direct IO to do a single
transfer from the disk subsystem (or logical volume or whatever) to the
address space where the app wants the data. In fact, the larger you can make
the size of a single direct IO transfer, the nearer to the theoretical
bandwidth IO limit you can reach. Sticking a generic linux style buffer
cache in the middle destroys any chance you will have of touching the
performance you could otherwise get from a high performance IO subsystem.
You'll just have to trust me on this one.. :) Of course, in my scenario, the
only interest in the high speed loading is just to populate the workspace
when the server starts up initially, though optimising write performance is
something that will benefit you from then on.

I've spent years getting data on and off disks quickly, and even something
as simple as a non machine page boundary aligned transfer can slow things
down by an order of magnitude. It's just a question of doing the
straightest, most 'hardware optimised' route.

>
> We (well, I) don't want you to shut up. Infact I enjoy discussions like
> this, because it freshens my memory on just why PG rocks so hard! ;)
>
Cool! I'm glad that you're really into PG, because I'm sort of counting on
it too, so that makes me feel happier, and it's really good to see the
newsgroups for the database being so active.

Thanks anyway for the replies. I'm going to do a lot more digging and see if
I can find anyone who has got their hands dirty trying to do this
'everything in RAM' scenario.

I'll come back with anything more interesting that I find.

Regards
Andy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Rylander 2004-07-03 03:29:56 Re: Enough RAM for entire Database.. cost aside, is this
Previous Message Alvaro Herrera 2004-07-02 23:07:10 Re: Row values