Re: Strange situation on slave server

From: Condor <condor(at)stz-bg(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: pgsql-general-owner(at)postgresql(dot)org
Subject: Re: Strange situation on slave server
Date: 2015-07-02 19:13:17
Message-ID: 01e63ea04adf82439f1eb942b21da80b@stz-bg.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 01-07-2015 13:53, Condor wrote:
> Hello,
>
> I have master - slave replication hot standby. Both server are linux
> slackware64 current with postgresql 9.4.4.
>
> Today when I logged to check some things on slave server I see on top
> memory taken 26%. That was strange for me and I restart server.
> Well, I checked after 40 minute and I see again replication process is
> little growing. For 10 min about 1.0.
>
>
> postgres 14162 0.0 0.7 6514656 190684 ? S 11:50 0:00
> /usr/bin/postgres -D /var/lib/postgres/database -F
> postgres 14163 0.4 5.4 6517856 1341268 ? Ss 11:50 0:26 \_
> postgres: startup process recovering 00000001000005E50000005E
> postgres 14166 0.1 5.4 6517876 1335980 ? Ss 11:50 0:09 \_
> postgres: checkpointer process
> postgres 14168 0.0 0.2 6517732 53276 ? Ss 11:50 0:01 \_
> postgres: writer process
> postgres 14180 0.0 0.0 29888 2864 ? Ss 11:50 0:01 \_
> postgres: stats collector process
> postgres 14183 0.2 0.0 6522488 5584 ? Ss 11:50 0:15 \_
> postgres: wal receiver process streaming 5E5/5EE36000
>
>
> Today our system is little bit average but slave server receive one
> wal file every minute. As you can see checkpointer process and
> recovering is 5.4% and until I write the email,
> I check them again and now is 6.0
>
>
> postgres 14162 0.0 0.7 6514656 190684 ? S 11:50 0:00
> /usr/bin/postgres -D /var/lib/postgres/database -F
> postgres 14163 0.4 6.0 6517856 1499868 ? Ss 11:50 0:30 \_
> postgres: startup process recovering 00000001000005E500000072
> postgres 14166 0.1 6.0 6517876 1497064 ? Ss 11:50 0:10 \_
> postgres: checkpointer process
> postgres 14168 0.0 0.2 6517732 53276 ? Ss 11:50 0:01 \_
> postgres: writer process
> postgres 14180 0.0 0.0 29888 2864 ? Ss 11:50 0:01 \_
> postgres: stats collector process
> postgres 14183 0.2 0.0 6522488 5584 ? Ss 11:50 0:18 \_
> postgres: wal receiver process streaming 5E5/72B49718
>
>
> I have free memory:
> Mem: 24634464k total, 14674532k used, 9959932k free, 324108k
> buffers
> Swap: 6589196k total, 1872k used, 6587324k free, 11819140k cached
>
>
> I try to use strace on process:
>
> # strace -p 14163
>
> lseek(9, 0, SEEK_END) = 381943808
> lseek(23, 0, SEEK_END) = 376832
> lseek(5, 3563520, SEEK_SET) = 3563520
> read(5,
> "~\320\5\0\1\0\0\0\0`6C\345\5\0\0\344\7\0\0\0\0\0\0\0\0\0\0\333I\f\0"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END) = 381943808
> lseek(5, 3571712, SEEK_SET) = 3571712
> read(5,
> "~\320\5\0\1\0\0\0\0\2006C\345\5\0\0^\6\0\0\0\0\0\0Ja.\0\20\0\220h"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(10, 0, SEEK_END) = 186392576
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(11, 0, SEEK_END) = 182566912
> read(3, 0x7ffd2d7a60d0, 16) = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN}], 1, 5000) = ? ERESTART_RESTARTBLOCK
> (Interrupted by signal)
> --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
> si_uid=1000} ---
> write(4, "\0", 1) = 1
> rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system
> call)
> read(3, "\0", 16) = 1
> read(6, 0x7ffd2d7a611f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(5, 3579904, SEEK_SET) = 3579904
> read(5,
> "~\320\5\0\1\0\0\0\0\2406C\345\5\0\0\232\5\0\0\0\0\0\0\0\0\0\0*\231\1\0"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(12, 0, SEEK_END) = 203612160
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(13, 0, SEEK_END) = 331071488
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(14, 0, SEEK_END) = 193331200
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(15, 0, SEEK_END) = 271171584
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(16, 0, SEEK_END) = 187580416
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(17, 0, SEEK_END) = 193257472
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(18, 0, SEEK_END) = 277381120
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(19, 0, SEEK_END) = 199884800
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(20, 0, SEEK_END) = 193396736
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(21, 0, SEEK_END) = 204808192
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(22, 0, SEEK_END) = 193355776
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END) = 381943808
> lseek(23, 0, SEEK_END) = 376832
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END) = 381943808
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(24, 0, SEEK_END) = 374423552
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(25, 0, SEEK_END) = 1777664
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(24, 0, SEEK_END) = 374423552
> read(6, 0x7ffd2d7a672f, 1) = -1 EAGAIN (Resource
> temporarily unavailable)
> read(3, 0x7ffd2d7a60d0, 16) = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN}], 1, 5000) = ? ERESTART_RESTARTBLOCK
> (Interrupted by signal)
> --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
> si_uid=1000} ---
> write(4, "\0", 1) = 1
> rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system
> call)
>
>
> and continue...
> That is the slave server only for mirroring database no any
> connections and query ever run on him.
>
> My question is this normal ?
>
>
>
> Regards,
> Hristo Simeonov

Hello again,
sry I up the mail again, here is my new statistic:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 14162 0.0 0.7 6514656 188168 ? S Jul01 0:00
/usr/bin/postgres -D /var/lib/postgres/database -F
postgres 14163 0.4 26.1 6518012 6437776 ? Ss Jul01 10:04
postgres: startup process recovering 00000001000005F10000007A
postgres 14166 0.1 25.9 6518516 6396756 ? Ss Jul01 3:00
postgres: checkpointer process
postgres 14168 0.0 15.0 6517872 3708012 ? Ss Jul01 0:23
postgres: writer process
postgres 14180 0.0 0.0 29888 1992 ? Ss Jul01 0:13
postgres: stats collector process
postgres 14183 0.2 0.0 6522488 4064 ? Ss Jul01 5:36
postgres: wal receiver process streaming 5F1/7A854300

Last wal file was received 10 minutes ago.

Free mem now:
Mem: 24634464k total, 19669032k used, 4965432k free, 350428k buffers
Swap: 6589196k total, 2704k used, 6586492k free, 17825832k cached

Any one have idea why the hell the checkpointer or writer process in
idle state take so much memory ? Is that normal ?

Cheers,
Hristo

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sameer Kumar 2015-07-03 05:18:00 Re: very slow queries and ineffective vacuum
Previous Message Bill Moran 2015-07-02 17:30:04 Re: very slow queries and ineffective vacuum