From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Vacuum, Freeze and Analyze: the big picture |
Date: | 2013-06-03 22:16:49 |
Message-ID: | 1370297809.95780.YahooMailNeo@web162906.mail.bf1.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>> On Mon, Jun 03, 2013 at 11:27:57AM +0300, Ants Aasma wrote:
>>> It could be related to the OS. I have no evidence for or against, but
>>> it's possible that OS write-out routines defeat the careful cost based
>>> throttling that PostgreSQL does by periodically dumping a large
>>> portion of dirty pages into the write queue at once. That does nasty
>>> things to query latencies as evidenced by the work on checkpoint
>>> spreading.
>>
>> In other contexts I've run into issues relating to large continuous
>> writes stalling. The issue is basically that the Linux kernel allows
>> (by default) writes to pile up until they reach 5% of physical memory
>> before deciding that the sucker who wrote the last block becomes
>> responsible for writing the whole lot out. At full speed of course.
>> Depending on the amount of memory and the I/O speed of your disks this
>> could take a while, and interfere with other processes.
>>
>> This leads to extremely bursty I/O behaviour.
>>
>> The solution, as usual, is to make it more aggressive, so the
>> kernel background writer triggers at 1% memory.
>>
>> I'm not saying that's the problem here, but it is an example of a
>> situation where the write queue can become very large very quickly.
>
> Yeah. IMHO, the Linux kernel's behavior around the write queue is
> flagrantly insane. The threshold for background writing really seems
> like it ought to be zero. I can see why it makes sense to postpone
> writing back dirty data if we're otherwise starved for I/O.
I imagine the reason the OS guys would give for holding up on disk
writes for as long as possible would sound an awful lot like the
reason PostgreSQL developers give for doing it. Keep in mind that
the OS doesn't know whether there might or might not be another
layer of caching (on a battery-backed RAID controller or SSD).
It's trying to minimize disk writes by waiting, to improve
throughput by collapsing duplicate writes and allowing the writes
to be performed in a more efficient order based on physical layout.
> But it seems like the kernel is disposed to cache large amounts
> of dirty data for an unbounded period of time even if the I/O
> system is completely idle,
It's not unbounded time. Last I heard, the default was 30 seconds.
> and it's difficult to imagine what class of user would find that
> behavior desirable.
Well, certainly not a user of a database that keeps dirty pages
lingering for five minutes by default, and often increases that to
minimize full page writes. IMO, our defaults for bgwriter are far
too passive.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2013-06-03 23:41:32 | Re: Vacuum, Freeze and Analyze: the big picture |
Previous Message | Robert Haas | 2013-06-03 21:35:33 | Re: Vacuum, Freeze and Analyze: the big picture |