From: | Jan Kara <jack(at)suse(dot)cz> |
---|---|
To: | Hannu Krosing <hannu(at)2ndQuadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dave Chinner <david(at)fromorbit(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trond Myklebust <trondmy(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date: | 2014-01-15 13:01:58 |
Message-ID: | 20140115130158.GA9141@quack.suse.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed 15-01-14 12:16:50, Hannu Krosing wrote:
> On 01/14/2014 06:12 PM, Robert Haas wrote:
> > This would be pretty similar to copy-on-write, except
> > without the copying. It would just be
> > forget-from-the-buffer-pool-on-write.
>
> +1
>
> A version of this could probably already be implement using MADV_DONTNEED
> and MADV_WILLNEED
>
> Thet is, just after reading the page in, use MADV_DONTNEED on it. When
> evicting
> a clean page, check that it is still in cache and if it is, then
> MADV_WILLNEED it.
>
> Another nice thing to do would be dynamically adjusting kernel
> dirty_background_ratio
> and other related knobs in real time based on how many buffers are dirty
> inside postgresql.
> Maybe in background writer.
>
> Question to LKM folks - will kernel react well to frequent changes to
> /proc/sys/vm/dirty_* ?
> How frequent can they be (every few second? every second? 100Hz ?)
So the question is what do you mean by 'react'. We check whether we
should start background writeback every dirty_writeback_centisecs (5s). We
will also check whether we didn't exceed the background dirty limit (and
wake writeback thread) when dirtying pages. However this check happens once
per several dirtied MB (unless we are close to dirty_bytes).
When writeback is running we check roughly once per second (the logic is
more complex there but I don't think explaining details would be useful
here) whether we are below dirty_background_bytes and stop writeback in
that case.
So changing dirty_background_bytes every few seconds should work
reasonably, once a second is pushing it and 100 Hz - no way. But I'd also
note that you have conflicting requirements on the kernel writeback. On one
hand you want checkpoint data to steadily trickle to disk (well, trickle
isn't exactly the proper word since if you need to checkpoing 16 GB every 5
minutes than you need a steady throughput of ~50 MB/s just for
checkpointing) so you want to set dirty_background_bytes low, on the other
hand you don't want temporary files to get to disk so you want to set
dirty_background_bytes high. And also that changes of
dirty_background_bytes probably will not take into account other events
happening on the system (maybe a DB backup is running...). So I'm somewhat
skeptical you will be able to tune dirty_background_bytes frequently in a
useful way.
Honza
--
Jan Kara <jack(at)suse(dot)cz>
SUSE Labs, CR
From | Date | Subject | |
---|---|---|---|
Next Message | Marko Tiikkaja | 2014-01-15 13:09:46 | Re: plpgsql.warn_shadow |
Previous Message | Ashutosh Bapat | 2014-01-15 13:00:49 | Re: identify table oid for an AggState during plan tree initialization |