From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Hannu Krosing <hannu(at)2ndquadrant(dot)com> |
Cc: | Dave Chinner <david(at)fromorbit(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trond Myklebust <trondmy(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-14 14:54:06 |
Message-ID: | CAGTBQpYMWCWY0CkrD25vRkyTap-ePBgC=4-skWqT0QQ09gMJjw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 14, 2014 at 11:39 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> On 01/14/2014 09:39 AM, Claudio Freire wrote:
>> On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>>> Again, as said above the linux file system is doing fine. What we
>>> want is a few ways to interact with it to let it do even better when
>>> working with postgresql by telling it some stuff it otherwise would
>>> have to second guess and by sometimes giving it back some cache
>>> pages which were copied away for potential modifying but ended
>>> up clean in the end.
>> You don't need new interfaces. Only a slight modification of what
>> fadvise DONTNEED does.
>>
>> This insistence in injecting pages from postgres to kernel is just a
>> bad idea.
> Do you think it would be possible to map copy-on-write pages
> from linux cache to postgresql cache ?
>
> this would be a step in direction of solving the double-ram-usage
> of pages which have not been read from syscache to postgresql
> cache without sacrificing linux read-ahead (which I assume does
> not happen when reads bypass system cache).
>
> and we can write back the copy at the point when it is safe (from
> postgresql perspective) to let the system write them back ?
>
> Do you think it is possible to make it work with good performance
> for a few million 8kb pages ?
I don't think so. The kernel would need to walk the page mapping on
each page fault, which would incurr the cost of a read cache hit on
each page fault.
A cache hit is still orders of magnitude slower than a regular page
fault, because the process page map is compact and efficient. But if
you bloat it, or if you make the kernel go read the buffer cache, it
would mean bad performance for RAM access, which I'd venture isn't
really a net gain.
That's probably the reason there is no zero-copy read mechanism.
Because you always have to copy from/to the buffer cache anyway.
Of course, this is just OTOMH. Without actually benchmarking, this is
all blabber.
>> At the very worst, it may
>> introduce serious security and reliability implications, when
>> applications can destroy the consistency of the page cache (even if
>> full access rights are checked, there's still the possibility this
>> inconsistency might be exploitable).
> If you allow write() which just writes clean pages, I can not see
> where the extra security concerns are beyond what normal
> write can do.
I've been working on security enough to never dismiss any kind of
system-level inconsistency.
The fact that you can make user-land applications see different data
than kernel-land code has over-reaching consequences that are hard to
ponder.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-01-14 14:57:20 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Previous Message | Kevin Grittner | 2014-01-14 14:42:43 | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |