On Mon, Oct 25, 2004 at 11:34:25AM -0400, Jan Wieck wrote:
> On 10/22/2004 4:09 PM, Kenneth Marshall wrote:
>
> > On Fri, Oct 22, 2004 at 03:35:49PM -0400, Jan Wieck wrote:
> >> On 10/22/2004 2:50 PM, Simon Riggs wrote:
> >>
> >> >I've been using the ARC debug options to analyse memory usage on the
> >> >PostgreSQL 8.0 server. This is a precursor to more complex performance
> >> >analysis work on the OSDL test suite.
> >> >
> >> >I've simplified some of the ARC reporting into a single log line, which
> >> >is enclosed here as a patch on freelist.c. This includes reporting of:
> >> >- the total memory in use, which wasn't previously reported
> >> >- the cache hit ratio, which was slightly incorrectly calculated
> >> >- a useful-ish value for looking at the "B" lists in ARC
> >> >(This is a patch against cvstip, but I'm not sure whether this has
> >> >potential for inclusion in 8.0...)
> >> >
> >> >The total memory in use is useful because it allows you to tell whether
> >> >shared_buffers is set too high. If it is set too high, then memory usage
> >> >will continue to grow slowly up to the max, without any corresponding
> >> >increase in cache hit ratio. If shared_buffers is too small, then memory
> >> >usage will climb quickly and linearly to its maximum.
> >> >
> >> >The last one I've called "turbulence" in an attempt to ascribe some
> >> >useful meaning to B1/B2 hits - I've tried a few other measures though
> >> >without much success. Turbulence is the hit ratio of B1+B2 lists added
> >> >together. By observation, this is zero when ARC gives smooth operation,
> >> >and goes above zero otherwise. Typically, turbulence occurs when
> >> >shared_buffers is too small for the working set of the database/workload
> >> >combination and ARC repeatedly re-balances the lengths of T1/T2 as a
> >> >result of "near-misses" on the B1/B2 lists. Turbulence doesn't usually
> >> >cut in until the cache is fully utilized, so there is usually some delay
> >> >after startup.
> >> >
> >> >We also recently discussed that I would add some further memory analysis
> >> >features for 8.1, so I've been trying to figure out how.
> >> >
> >> >The idea that B1, B2 represent something really useful doesn't seem to
> >> >have been borne out - though I'm open to persuasion there.
> >> >
> >> >I originally envisaged a "shadow list" operating in extension of the
> >> >main ARC list. This will require some re-coding, since the variables and
> >> >macros are all hard-coded to a single set of lists. No complaints, just
> >> >it will take a little longer than we all thought (for me, that is...)
> >> >
> >> >My proposal is to alter the code to allow an array of memory linked
> >> >lists. The actual list would be [0] - other additional lists would be
> >> >created dynamically as required i.e. not using IFDEFs, since I want this
> >> >to be controlled by a SIGHUP GUC to allow on-site tuning, not just lab
> >> >work. This will then allow reporting against the additional lists, so
> >> >that cache hit ratios can be seen with various other "prototype"
> >> >shared_buffer settings.
> >>
> >> All the existing lists live in shared memory, so that dynamic approach
> >> suffers from the fact that the memory has to be allocated during ipc_init.
> >>
> >> What do you think about my other theory to make C actually 2x effective
> >> cache size and NOT to keep T1 in shared buffers but to assume T1 lives
> >> in the OS buffer cache?
> >>
> >>
> >> Jan
> >>
> > Jan,
> >
> >>From the articles that I have seen on the ARC algorithm, I do not think
> > that using the effective cache size to set C would be a win. The design
> > of the ARC process is to allow the cache to optimize its use in response
> > to the actual workload. It may be the best use of the cache in some cases
> > to have the entire cache allocated to T1 and similarly for T2. If fact,
> > the ability to alter the behavior as needed is one of the key advantages.
>
> Only the "working set" of the database, that is the pages that are very
> frequently used, are worth holding in shared memory at all. The rest
> should be copied in and out of the OS disc buffers.
>
> The problem is, with a too small directory ARC cannot guesstimate what
> might be in the kernel buffers. Nor can it guesstimate what recently was
> in the kernel buffers and got pushed out from there. That results in a
> way too small B1 list, and therefore we don't get B1 hits when in fact
> the data was found in memory. B1 hits is what increases the T1target,
> and since we are missing them with a too small directory size, our
> implementation of ARC is propably using a T2 size larger than the
> working set. That is not optimal.
>
> If we would replace the dynamic T1 buffers with a max_backends*2 area of
> shared buffers, use a C value representing the effective cache size and
> limit the T1target on the lower bound to effective cache size - shared
> buffers, then we basically moved the T1 cache into the OS buffers.
>
> This all only holds water, if the OS is allowed to swap out shared
> memory. And that was my initial question, how likely is it to find this
> to be true these days?
>
>
> Jan
>
I've asked our linux kernel guys some quick questions and they say
you can lock mmapped memory and sys v shared memory with mlock and
SHM_LOCK, resp. Otherwise the OS will swap out memory as it sees
fit, whether or not it's shared.
Mark