Re: reindex/vacuum locking/performance?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Andrew Sullivan <andrew(at)libertyrms(dot)info>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: reindex/vacuum locking/performance?
Date: 2003-10-06 18:56:59
Message-ID: 200310061856.h96IuxO13756@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> > I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
> > right?
>
> Not necessarily --- as you point out, it's not clear that O_DIRECT would
> help us. What would be way cool is something similar to what James
> Rogers was talking about: a way to tell the kernel not to promote this
> page all the way to the top of its LRU list. I'm not sure that *any*
> Unixen have such an API, let alone one that's common across more than
> one platform :-(

Solaris has "free-behind", which prevents a large kernel sequential scan
from blowing out the cache.

I only read about it in the Mauro Solaris Internals book, and it seems
to be done automatically. I guess most OS's don't do this optimization
because they usually don't read files larger than their cache.

I see BSD/OS madvise() has:

#define MADV_NORMAL 0 /* no further special treatment */
#define MADV_RANDOM 1 /* expect random page references */
#define MADV_SEQUENTIAL 2 /* expect sequential references */
#define MADV_WILLNEED 3 /* will need these pages */
--> #define MADV_DONTNEED 4 /* don't need these pages */
#define MADV_SPACEAVAIL 5 /* insure that resources are reserved */

The marked one seems to have the control we need. Of course, the kernel
madvise() code has:

/* Not yet implemented */

Looks like NetBSD implements it, but it also unmaps the page from the
address space, which might be more than we want. NetBSD alao has:

#define MADV_FREE 6 /* pages are empty, free them */

which frees the page. I am unclear on its us.

FreeBSD has this comment:

/*
* vm_page_dontneed
*
* Cache, deactivate, or do nothing as appropriate. This routine
* is typically used by madvise() MADV_DONTNEED.
*
* Generally speaking we want to move the page into the cache so
* it gets reused quickly. However, this can result in a silly syndrome
* due to the page recycling too quickly. Small objects will not be
* fully cached. On the otherhand, if we move the page to the inactive
* queue we wind up with a problem whereby very large objects
* unnecessarily blow away our inactive and cache queues.
*
* The solution is to move the pages based on a fixed weighting. We
* either leave them alone, deactivate them, or move them to the cache,
* where moving them to the cache has the highest weighting.
* By forcing some pages into other queues we eventually force the
* system to balance the queues, potentially recovering other unrelated
* space from active. The idea is to not force this to happen too
* often.
*/

The Linux comment is:

/*
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
* zap_page_range call sets things up for refill_inactive to actually free
* these pages later if no one else has touched them in the meantime,
* although we could add these pages to a global reuse list for
* refill_inactive to pick up before reclaiming other pages.
*
* NB: This interface discards data rather than pushes it out to swap,
* as some implementations do. This has performance implications for
* applications like large transactional databases which want to discard
* pages in anonymous maps after committing to backing store the data
* that was kept in them. There is no reason to write this data out to
* the swap area if the application is discarding it.
*
* An interface that causes the system to free clean pages and flush
* dirty pages is already available as msync(MS_INVALIDATE).
*/

It seems mmap is more for controlling the memory mapping of files rather
than controlling the cache itself.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jason Hihn 2003-10-06 19:14:38 Re: reindex/vacuum locking/performance?
Previous Message Larry Rosenman 2003-10-06 18:39:10 Re: reindex/vacuum locking/performance?