From: | Jeff Layton <jlayton(at)redhat(dot)com> |
---|---|
To: | Dave Chinner <david(at)fromorbit(dot)com> |
Cc: | Marti Raudsepp <marti(at)juffo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-20 13:43:34 |
Message-ID: | 20140120084334.775641f0@tlielax.poochiereds.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 20 Jan 2014 10:51:41 +1100
Dave Chinner <david(at)fromorbit(dot)com> wrote:
> On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> > On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> > > it's very common to create temporary file data that will never, ever, ever
> > > actually NEED to hit disk. Where I work being able to tell the kernel to
> > > avoid flushing those files unless the kernel thinks it's got better things
> > > to do with that memory would be EXTREMELY valuable
> >
> > Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
> >
> > ISTR that there was discussion about implementing something analogous
> > in Linux when ext4 got delayed allocation support, but I don't think
> > it got anywhere and I can't find the discussion now. I think the
> > proposed interface was to create and then unlink the file immediately,
> > which serves as a hint that the application doesn't care about
> > persistence.
>
> You're thinking about O_TMPFILE, which is for making temp files that
> can't be seen in the filesystem namespace, not for preventing them
> from being written to disk.
>
> I don't really like the idea of overloading a namespace directive to
> have special writeback connotations. What we are getting into the
> realm of here is generic user controlled allocation and writeback
> policy...
>
Agreed -- O_TMPFILE semantics are a different beast entirely.
Perhaps what might be reasonable though is a fadvise POSIX_FADV_TMPFILE
hint that tells the kernel: "Don't write out this data unless it's
necessary due to memory pressure".
If the inode is only open with file descriptors that have that hint
set on them. Then we could exempt it from dirty_expire_interval and
dirty_writeback_interval?
Tracking that desire on an inode open multiple times might be
"interesting" though. We'd have to be quite careful not to allow that
to open an attack vector.
> > Postgres is far from being the only application that wants this; many
> > people resort to tmpfs because of this:
> > https://lwn.net/Articles/499410/
>
> Yes, we covered the possibility of using tmpfs much earlier in the
> thread, and came to the conclusion that temp files can be larger
> than memory so tmpfs isn't the solution here. :)
>
--
Jeff Layton <jlayton(at)redhat(dot)com>
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2014-01-20 13:45:41 | Re: Hstore 2.0 patch |
Previous Message | Dean Rasheed | 2014-01-20 13:29:42 | Re: array_length(anyarray) |