From: | Dave Chinner <david(at)fromorbit(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-16 00:19:10 |
Message-ID: | 20140116001910.GO3431@dastard |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> Dave Chinner <david(at)fromorbit(dot)com> writes:
> > On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> >> What we'd really like for checkpointing is to hand the kernel a boatload
> >> (several GB) of dirty pages and say "how about you push all this to disk
> >> over the next few minutes, in whatever way seems optimal given the storage
> >> hardware and system situation. Let us know when you're done."
>
> > The issue there is that the kernel has other triggers for needing to
> > clean data. We have no infrastructure to handle variable writeback
> > deadlines at the moment, nor do we have any infrastructure to do
> > roughly metered writeback of such files to disk. I think we could
> > add it to the infrastructure without too much perturbation of the
> > code, but as you've pointed out that still leaves the fact there's
> > no obvious interface to configure such behaviour. Would it need to
> > be persistent?
>
> No, we'd be happy to re-request it during each checkpoint cycle, as
> long as that wasn't an unduly expensive call to make. I'm not quite
> sure where such requests ought to "live" though. One idea is to tie
> them to file descriptors; but the data to be written might be spread
> across more files than we really want to keep open at one time.
It would be a property of the inode, as that is how writeback is
tracked and timed. Set and queried through a file descriptor,
though - it's basically the same context that fadvise works
through.
> But the only other idea that comes to mind is some kind of global sysctl,
> which would probably have security and permissions issues. (One thing
> that hasn't been mentioned yet in this thread, but maybe is worth pointing
> out now, is that Postgres does not run as root, and definitely doesn't
> want to. So we don't want a knob that would require root permissions
> to twiddle.)
I have assumed all along that requiring root to do stuff would be a
bad thing. :)
> We could probably live with serially checkpointing data
> in sets of however-many-files-we-can-have-open, if file descriptors are
> the place to keep the requests.
Inodes live longer than file descriptors, but there's no guarantee
that they live from one fd context to another. Hence my question
about persistence ;)
Cheers,
Dave.
--
Dave Chinner
david(at)fromorbit(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-01-16 00:21:07 | Re: CREATE FOREIGN TABLE ( ... LIKE ... ) |
Previous Message | David Fetter | 2014-01-16 00:17:54 | Re: CREATE FOREIGN TABLE ( ... LIKE ... ) |