Quick Links

Re: Enough RAM for entire Database.. cost aside, is this

From:	"Andy B" <abhousehuntRE-M--O--V-E(at)blueyonder(dot)co(dot)uk>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Enough RAM for entire Database.. cost aside, is this
Date:	2004-07-03 09:42:38
Message-ID:	icvFc.18321$HQ1.1803@fe2.news.blueyonder.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

>If all the data fits into memory, then this cache thrashing won't occur,
yes?

No - it *can* occur in a 2 tier cache strategy.

The critical question here is: *If* the data postgresql needs is in the
linux buffer cache, what (if anything) does the OS have to do to make it
available to the postmaster? The worst case would be an actual memory copy
of the data from the linux cache to where postgresql actually asked for the
file data to be loaded. I'll need to find out more about the linux buffer
cache to answer this question for myself.

If a memory copy is required, then this is a good way of visualising it:

I live in London, but have to work in 5th avenue, NYC for a week.
My company stupidly sets me up in a smelly hotel by JFK airport.

London is 'The disk system'
5th Avenue is 'The postgresql workspace'
My smelly hotel at JFK is the 'linux buffer cache'.

Clearly, it's far quicker to commute to work each day from JFK, than it is
to commute from London each day.
But wouldn't it be faster still if I had a room in the swanky hotel on Fifth
avenue above the office?

Yes, I'd still have to arrive at JFK, but then I could stay in 5th avenue
all week until I'm finished.

And as far as direct IO is concerned, it gets better: that would be like
taking a plane right straight from London to 5th avenue, avoiding the
tedious immigration line at JFK.

Having said all of that, I'll go and find more about the linux buffer cache
before I pester the Postrgresql cache experts.

Thanks for that test plan...

I'm in the same position with you w.r.t. machinery - hence my thread (I was
hoping someone had tried this!). But I will get my hands on something quite
soon and I'll do your test and post the results. I knew I'd have no option
than to become very familiar with the behaviour of whatever DB I chose, so
I'm looking forward to it (sort of)

>That is true, and will be until PG implements it's own cache memory
management AND on-disk storage. I'd be willing to bet that any patches
that dramatically increase speed for the majority of use cases would be
considered strongly.

I have to find out for sure, but since linux is being used in lots of HPC
clusters now and for some real time applications, I'd be very surprised if
there wasn't already a direct IO strategy in the kernel, but that's one of
the things I'm looking at now. That being the case, and assuming that the
postgresql cache operates on arbitrary sized pages (it seems set at 8KB
pages by default - which, depending on the platform may be good, assuming
also that the address space is appropriately machine page aligned), then it
might be 'very easy indeed' to upgrade postgresql's cache manager to fire
the necessary IO calls at the OS. Again - this is really something I need to
talk to a pg cache expert about, but I've got a lot of catching up to do
first.

By the way - Direct IO doesn't mean the application has to know about the
disk configuration or anything messy like that. It is still a block IO
operation directed through the kernel. From the programmer point of view the
changes are trivial. It just bypasses the OS's own generic file cache, and
does a super efficient single DMA memory transfer between the IO system and
the host RAM - all of this managed has to be managed by the OS, or you'll
end up with a platform specific implementation that is of no real use.

I would run a mile if this work required Postgresql to know about actual
storage.

>Finding target data (complex lookup in the PG shared buffers) will take
longer than transferring disk pages from the OS cache, based on index
information, and then doing a simple, fast scan. This is the current
working assumption for postres.

This is what scares me a little. Why would the PG cache lookup be complex?
That's a question for me to find the answer to... I'll wade through the
developer resources and source code etc. before bothering the experts.

>Your contention, correct me if I've gotten this wrong, is that transferring
it from the OS cache will be slower than finding it using a more complex
lookup, but without moving the data from the OS cache to the PG workspace.

>In both scenarios all data is in RAM.

Well if the PG cache is slower at dealing with large data, then yes, I
concede that this changes the picture. But at this point in time, my gut
reaction would be to assume that the PG cache is blazingly efficient. It is,
after all, the heart of the DB, and all the DBs I've seen have a similar
architecture.

>I see what you mean about a waste of memory, but if the costing parameters
are tuned well for the use of the specific DB then current evidence with
postgres shows that the former strategy is faster. I would be interested
to see if the latter scenario could be made to work with the current PG
codebase, or even if there is a simple patch that could speed up the
shared_buffers case.

Well it's still not clear that the memory does get wasted, in which case
this 'problem' doesn't exist (See my comment about whether a copy occurs
between the linux buffer cache and the postmaster's address space.) If linux
just returns instantly and it looks like the data was copied, but it just
ended up being a vmem pointer reference change, then the only issue is that
of making the IO much quicker (when the disk actually needs to be read from
or written to). Anyway.. as I said. I'm in learning mode right now!

>I guess, though, that the main problem I see with the big shared_buffers
idea on linux specifically is that SHMEM is slow on linux. Rumor has it
that the SHMEM implementation on Solaris is much faster than on linux, so
PG on Solaris may very well benefit from using your approach, but I just
don't see it on linux, which is where this discussion started IIRC.

Hmm... again more reading for me. It occurs to me that I haven't *fully*
decided on linux at all. Linux is great for developing on a shoestring. At
the end of the day, I'm quite committed to postgresql in my project. I like
a lot of its features and would be loathed to move away from it, so it might
be that the final production DB server ends up running pgsql on an OS other
than linux.

Still, it would be nice if it did work under linux.

>I will wait with baited breath ;)

Hehe - I've just read back what I've written, and it seems I have to
suddenly understand the depths of both the pg kernel and the linux kernel.
And I didn't even know about the linux buffer cache until yesterday!

So it'll take a wee while, but I'll be back!

Thanks again for your comments,

Regards,
Andy

In response to

Re: Enough RAM for entire Database.. cost aside, is this at 2004-07-03 03:29:56 from Mike Rylander

Browse pgsql-general by date

	From	Date	Subject
Next Message	Rajesh Kumar Mallah	2004-07-03 10:50:09	alter table cascade does not give notice about dropped indexes
Previous Message	Mike Rylander	2004-07-03 03:29:56	Re: Enough RAM for entire Database.. cost aside, is this