From: | Curt Sampson <cjs(at)cynic(dot)net> |
---|---|
To: | Jan Wieck <JanWieck(at)Yahoo(dot)com> |
Cc: | GB Clark <postgres(at)vsservices(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, <glenebob(at)nwlink(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Linux max on shared buffers? |
Date: | 2002-07-20 09:30:42 |
Message-ID: | Pine.NEB.4.44.0207201818160.553-100000@angelic.cynic.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, 19 Jul 2002, Jan Wieck wrote:
> I still don't completely understand what you are proposing. What I
> understood so far is that you want to avoid double buffering (OS buffer
> plus SHMEM). Wouldn't that require that the access to a block in the
> file (table, index, sequence, ...) has to go directly through a mmapped
> region of that file?
Right.
> Let's create a little test case to discuss. I have two tables, 2
> Gigabyte in size each (making 4 segments of 1 GB total) plus a 512 MB
> index for each. Now I join them in a query, that results in a nestloop
> doing index scans.
>
> On a 32 bit system you cannot mmap both tables plus the indexes at the
> same time completely. But the access of the execution plan is reading
> one tables index, fetching the heap tuples from it by random access, and
> inside of that loop doing the same for the second table. So chances are,
> that this plan randomly peeks around in the entire 5 Gigabyte, at least
> you cannot predict which blocks it will need.
Well, you can certainly predict the index blocks. So after some initial
reads to get to the bottom level of the index, you might map a few
megabytes of it contiguously because you know you'll need it. While
you're at it, you can inform the OS you're using it sequentially
(so it can do read-ahead--even though it otherwise looks to the OS
like the process is doing random reads) by doing an madvise() with
MADV_SEQUENTIAL.
> So far so good. Now what do you map when? Can you map multiple
> noncontigous 8K blocks out of each file?
Sure. You can just map one 8K block at a time, and when you've got
lots of mappings, start dropping the ones you've not used for a while,
LRU-style. How many mappings you want to keep "cached" for your process
would depend on the overhead of doing the map versus the overhead of
having a lot of system calls. Personally, I think that the overhead of
having tons of mappings is pretty low, but I'll have to read through
some kernel code to make sure. At any rate, it's no problem to change
the figure depending on any factor you like.
> If so, how do you coordinate that all backends in summary use at
> maximum the number of blocks you want PostgreSQL to use....
You don't. Just map as much as you like; the operating system takes
care of what blocks will remain in memory or be written out to disk
(or dropped if they're clean), bringing in a block from disk when you
reference one that's not currently in physical memory, and so on.
> And if a backend needs a block and the max is reached already, how
> does it tell the other backends to unmap something?
You don't. The mappings are completely separate for every process.
> I assume I am missing something very important here....
Yeah, you're missing that the OS does all of the work for you. :-)
Of course, this only works on systems with a POSIX mmap, which those
particular HP systems Tom mentioned obviously don't have. For those
systems, though, I expect running as a 64-bit program fixes the problem
(because you've got a couple billion times as much address space). But
if postgres runs on 32-bit systems with the same restrictions, we'd
probably just have to keep the option of using read/write instead, and
take the performance hit that we do now.
cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
From | Date | Subject | |
---|---|---|---|
Next Message | Curt Sampson | 2002-07-20 09:36:52 | Re: Linux max on shared buffers? |
Previous Message | ratlhead | 2002-07-20 09:25:48 | Shell Script help for backup |