Re: high io BUT huge amount of free memory

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Миша Тюрин <tmihail(at)bk(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: high io BUT huge amount of free memory
Date: 2013-04-24 13:39:09
Message-ID: 5177E07D.3090709@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/24/2013 08:24 AM, Robert Haas wrote:

> Are you referring to the fact that vm.zone_reclaim_mode = 1 is an
> idiotic default?

Well... it is. But even on systems where it's not the default or is
explicitly disabled, there's just something hideously wrong with NUMA in
general. Take a look at our numa distribution on a heavily loaded system:

available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 36853 MB
node 0 free: 14315 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 36863 MB
node 1 free: 300 MB
node distances:
node 0 1
0: 10 20
1: 20 10

What the hell? Seriously? Using numactl and starting in interleave
didn't fix this, either. It just... arbitrarily ignores a huge chunk of
memory for no discernible reason.

The memory pressure code in Linux is extremely fucked up. I can't find
it right now, but the memory management algorithm makes some pretty
ridiculous assumptions once you pass half memory usage, regarding what
is in active and inactive cache.

I hate to rant, but it gets clearer to me every day that Linux is
optimized for desktop systems, and generally only kinda works for
servers. Once you start throwing vast amounts of memory, CPU, and
processes at it though, things start to get unpredictable.

That all goes back to my earlier threads that disabling process
autogrouping via the kernel.sched_autogroup_enabled setting, magically
gave us 20-30% better performance. The optimal setting for a server is
clearly to disable process autogrouping, and yet it's enabled by
default, and strongly advocated by Linus himself as a vast improvement.

I get it. It's better for desktop systems. But the LAMP stack alone has
probably a couple orders of magnitude more use cases than Joe Blow's
Pentium 4 in his basement. Yet it's the latter case that's optimized for.

Servers are getting shafted in a lot of cases, and it's actually
starting to make me angry.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-04-24 13:48:40 Re: REFRESH MATERIALIZED VIEW command in PL block hitting Assert
Previous Message Robert Haas 2013-04-24 13:30:36 Re: 9.3 release notes suggestions