From: | Aidan Van Dyk <aidan(at)highrise(dot)ca> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Bruce Momjian <bruce(at)momjian(dot)us>, jd(at)commandprompt(dot)com, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Steve Crawford <scrawford(at)pinpointresearch(dot)com>, pgsql-performance(at)postgresql(dot)org, Ben Chobot <bench(at)silentmedia(dot)com> |
Subject: | Re: BBU Cache vs. spindles |
Date: | 2010-10-29 15:56:09 |
Message-ID: | AANLkTi=ttpLEgYHTViATjbtS_dm-B5iRz3vOmaVDrw__@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance pgsql-www |
On Fri, Oct 29, 2010 at 11:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, we COULD keep the data in shared buffers, and then copy it into
> an mmap()'d region rather than calling write(), but I'm not sure
> there's any advantage to it. Managing address space mappings is a
> pain in the butt.
I could see this being a *theoretical* benefit in the case that the
background writer gains the ability to write out all blocks associated
with a file in order. In that case, you might get a win because you
could get a single mmap of the entire file, and just wholesale memcpy
blocks across, then sync/unmap it.
This, of course assumes a few things that must be for it to be per formant:
0) a list of blocks to be written grouped by files is readily available.
1) The pages you write to must be in the page cache, or your memcpy is
going to fault them in. With a plain write, you don't need the
over-written page in the cache.
2) Now, instead of the torn-page problem being FS block/sector sized
base, you can now actually have a possibly arbitrary amount of the
block memory written when the kernel writes out the page. you
*really* need full-page-writes.
3) The mmap overhead required for the kernel to setup the mappings is
less than the repeated syscalls of a simple write().
All those things seem like something that somebody could synthetically
benchmark to prove value before even trying to bolt into PostgreSQL.
a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-29 15:57:06 | Re: BBU Cache vs. spindles |
Previous Message | Robert Haas | 2010-10-29 15:43:58 | Re: BBU Cache vs. spindles |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-10-29 15:57:06 | Re: BBU Cache vs. spindles |
Previous Message | Robert Haas | 2010-10-29 15:43:58 | Re: BBU Cache vs. spindles |