From: | Dave Chinner <david(at)fromorbit(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-15 23:57:35 |
Message-ID: | 20140115235735.GL3431@dastard |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > On 01/15/2014 07:50 AM, Dave Chinner wrote:
> >> FWIW [and I know you're probably sick of hearing this by now], but
> >> the blk-io throttling works almost perfectly with applications that
> >> use direct IO.....
>
> > For checkpoint writes, direct I/O actually would be reasonable.
> > Bypassing the OS cache is a good thing in that case - we don't want the
> > written pages to evict other pages from the OS cache, as we already have
> > them in the PostgreSQL buffer cache.
>
> But in exchange for that, we'd have to deal with selecting an order to
> write pages that's appropriate depending on the filesystem layout,
> other things happening in the system, etc etc. We don't want to build
> an I/O scheduler, IMO, but we'd have to.
I don't see that as necessary - nobody else needs to do this with
direct IO. Indeed, if the application does ascending offset order
writeback from within a file, then it's replicating exactly what the
kernel page cache writeback does. If what the kernel does is good
enough for you, then I can't see how doing the same thing with
a background thread doing direct IO is going to need any special
help....
> > Writing one page at a time with O_DIRECT from a single process might be
> > quite slow, so we'd probably need to use writev() or asynchronous I/O to
> > work around that.
>
> Yeah, and if the system has multiple spindles, we'd need to be issuing
> multiple O_DIRECT writes concurrently, no?
>
> What we'd really like for checkpointing is to hand the kernel a boatload
> (several GB) of dirty pages and say "how about you push all this to disk
> over the next few minutes, in whatever way seems optimal given the storage
> hardware and system situation. Let us know when you're done."
The issue there is that the kernel has other triggers for needing to
clean data. We have no infrastructure to handle variable writeback
deadlines at the moment, nor do we have any infrastructure to do
roughly metered writeback of such files to disk. I think we could
add it to the infrastructure without too much perturbation of the
code, but as you've pointed out that still leaves the fact there's
no obvious interface to configure such behaviour. Would it need to
be persistent?
Cheers,
Dave.
--
Dave Chinner
david(at)fromorbit(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Vik Fearing | 2014-01-16 00:03:19 | Re: CREATE TABLESPACE WITH |
Previous Message | Tom Lane | 2014-01-15 23:34:16 | Re: trgm regex index peculiarity |