From: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
---|---|
To: | Trond Myklebust <trondmy(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-14 00:03:18 |
Message-ID: | 52D47EC6.4080007@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 01/13/2014 09:53 PM, Trond Myklebust wrote:
> On Jan 13, 2014, at 15:40, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
>> On 2014-01-13 15:15:16 -0500, Robert Haas wrote:
>>> On Mon, Jan 13, 2014 at 1:51 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>>>> I notice, Josh, that you didn't mention the problems many people
>>>> have run into with Transparent Huge Page defrag and with NUMA
>>>> access.
>>> Amen to that. Actually, I think NUMA can be (mostly?) fixed by
>>> setting zone_reclaim_mode; is there some other problem besides that?
>> I think that fixes some of the worst instances, but I've seen machines
>> spending horrible amounts of CPU (& BUS) time in page reclaim
>> nonetheless. If I analyzed it correctly it's in RAM << working set
>> workloads where RAM is pretty large and most of it is used as page
>> cache. The kernel ends up spending a huge percentage of time finding and
>> potentially defragmenting pages when looking for victim buffers.
>>
>>> On a related note, there's also the problem of double-buffering. When
>>> we read a page into shared_buffers, we leave a copy behind in the OS
>>> buffers, and similarly on write-out. It's very unclear what to do
>>> about this, since the kernel and PostgreSQL don't have intimate
>>> knowledge of what each other are doing, but it would be nice to solve
>>> somehow.
>> I've wondered before if there wouldn't be a chance for postgres to say
>> "my dear OS, that the file range 0-8192 of file x contains y, no need to
>> reread" and do that when we evict a page from s_b but I never dared to
>> actually propose that to kernel people...
> O_DIRECT was specifically designed to solve the problem of double buffering
> between applications and the kernel. Why are you not able to use that in these situations?
What is asked is the opposite of O_DIRECT - the write from a buffer inside
postgresql to linux *buffercache* and telling linux that it is the same
as what
is currently on disk, so don't bother to write it back ever.
This would avoid current double-buffering between postgresql and linux
buffer caches while still making use of linux cache when possible.
The use case is pages that postgresql has moved into its buffer cache
but which it has not modified. They will at some point be evicted from the
postgresql cache, but it is likely that they will still be needed
sometime soon,
so what is required is "writing them back" to the original file, only
they should
not really be written - or marked dirty to be written later - more
levels than
just to the linux cache, as they *already* are on the disk.
It is probably ok to put them in the LRU position as they are "written"
out from postgresql, though it may be better if we get some more control
over
where in the LRU order they would be placed. It may make sense to put them
there based on when they were last read while residing inside postgresql
cache
Cheers
--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2014-01-14 00:08:18 | Re: Disallow arrays with non-standard lower bounds |
Previous Message | Josh Berkus | 2014-01-13 23:57:53 | Re: plpgsql.consistent_into |