Re: Debugging shared memory issues on CentOS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mack Talcott <mack(dot)talcott(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Debugging shared memory issues on CentOS
Date: 2013-12-11 04:54:36
Message-ID: 3716.1386737676@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mack Talcott <mack(dot)talcott(at)gmail(dot)com> writes:
> I am trying to debug some shared memory issues with Postgres 9.3.1 and
> CentOS release 6.3 (Final). I have a database machine that probably has
> some misconfigured shared memory settings. It's getting into 2+ GB of
> swap. Restarting postgres frees all of the memory, but after a few hours
> of normal usage it will go back into swap.

Are you sure the kernel isn't just swapping out some idle processes
because it feels like it? These numbers don't exactly look like a
machine under stress:

> top - 09:38:16 up 1 day, 21:21, 3 users, load average: 0.40, 0.54, 0.45
> Tasks: 253 total, 2 running, 251 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.7%us, 0.2%sy, 0.0%ni, 97.8%id, 1.2%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 6998260k total, 6849048k used, 149212k free, 248k buffers
> Swap: 440478516k total, 1981912k used, 438496604k free, 1541356k cached

In particular, you've got 1.5 gig of filesystem cache, so you're hardly
out of memory. I don't know where the other 5.5 gig of RAM went, but
it doesn't look like postgres is eating it; what else is running on
this box?

These lines look absolutely normal, assuming that you've configured
shared_buffers somewhere in the neighborhood of 1GB:

> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 3534 postgres 20 0 2330m 1.4g 1.1g S 0.0 20.4 1:06.99 postgres:
> deploy mtalcott 10.222.154.172(53495) idle
> 9143 postgres 20 0 2221m 1.1g 983m S 0.0 16.9 0:14.75 postgres:
> deploy mtalcott 10.222.154.167(35811) idle
> 6026 postgres 20 0 2341m 1.1g 864m S 0.0 16.4 0:46.56 postgres:
> deploy mtalcott 10.222.154.167(37110) idle
> 18538 postgres 20 0 2327m 1.1g 865m S 0.0 16.1 2:06.59 postgres:
> deploy mtalcott 10.222.154.172(47796) idle
> 1575 postgres 20 0 2358m 1.1g 858m S 0.0 15.9 1:41.76 postgres:
> deploy mtalcott 10.222.154.172(52560) idle

The key thing to realize about that is that the SHR column is *shared*
memory, ie all these processes are referencing the same chunk of about 1GB
worth of memory. The process-specific memory is RES minus SHR, and none
of those processes seem tremendously out of line on that measure. (Note:
the fact that the SHR values aren't all exactly the same is because top
doesn't count a shared page until the process has physically touched that
page. Even the guy with 1.1g of SHR might not have touched all of the
shared storage yet.)

I'm not sure you have a problem here. If you do, these figures aren't
showing it. Having some stuff shoved out to swap is not a problem unless
you have a problem with the swap I/O rate. You might try watching "vmstat
1" for awhile to see if the si/so columns show significant activity.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Johnston 2013-12-11 07:18:38 Re: Problem with slow query with WHERE conditions with OR clause on primary keys
Previous Message Jeff Janes 2013-12-11 02:24:57 Re: select count(distinct ...) is slower than select distinct in about 5x