From: | Jamie Koceniak <jkoceniak(at)mediamath(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #9635: Wal sender process is using 100% CPU |
Date: | 2014-05-06 16:44:42 |
Message-ID: | 84CE67AB46ACDA41B18094177DD0853D212A5A@ORD2MBX04C.mex05.mlsrvr.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi Heikki,
Attached is a trace route for one of the pids that was running at 100% cpu.
Note: when the wal_senders hit 100% cpu, our slave servers tend to lag the master for up to around 3-4 minutes until the wal_sender cpu utilization drops. This is very problematic for us since we are trying to use the slaves for reads in our application.
Here is perf top results:
39.49% libz.so.1.2.7 [.] 0x000000000000301e
32.97% postgres [.] 0x00000000000cf150
6.50% postgres [.] nocachegetattr
2.63% postgres [.] btint4cmp
1.64% postgres [.] _start
1.40% [unknown] [.] 0x00007f9dfca6a34b
1.06% libc-2.13.so [.] 0x000000000012934b
0.94% postgres [.] slot_getattr
0.85% libcrypto.so.1.0.0 [.] 0x000000000009f48f
0.69% postgres [.] ExecProject
0.67% libz.so.1.2.7 [.] adler32
0.66% postgres [.] GetMemoryChunkSpace
0.58% [kernel] [k] copy_user_generic_string
0.45% postgres [.] heap_fill_tuple
0.38% postgres [.] execTuplesMatch
0.27% postgres [.] FunctionCall2Coll
0.27% postgres [.] ExecProcNode
0.25% postgres [.] heap_tuple_untoast_attr
0.22% postgres [.] MemoryContextAlloc
0.21% postgres [.] heap_form_minimal_tuple
0.21% postgres [.] pfree
0.19% postgres [.] heap_compute_data_size
Here is some data using perf record and perf report:
+ 33.93% 193019 postgres postgres [.] 0x0000000000369e27
+ 30.06% 173825 postgres libz.so.1.2.7 [.] 0x00000000000036c0
+ 10.21% 60618 postgres [unknown] [.] 0x00007f9dfc2fd529
+ 4.81% 27275 postgres postgres [.] nocachegetattr
+ 2.55% 14443 postgres postgres [.] btint4cmp
+ 1.22% 6916 postgres postgres [.] ExecProject
+ 1.20% 6837 postgres postgres [.] _start
+ 1.10% 6495 postgres libc-2.13.so [.] 0x000000000012b8e9
+ 0.84% 4760 postgres postgres [.] heap_fill_tuple
+ 0.74% 4265 postgres [kernel.kallsyms] [k] copy_user_generic_string
+ 0.71% 4001 postgres postgres [.] GetMemoryChunkSpace
+ 0.66% 3881 postgres libcrypto.so.1.0.0 [.] 0x0000000000081ec7
+ 0.63% 3673 postgres libz.so.1.2.7 [.] adler32
+ 0.43% 2448 postgres postgres [.] slot_getattr
+ 0.40% 2277 postgres postgres [.] ExecProcNode
+ 0.40% 2250 postgres postgres [.] heap_form_minimal_tuple
+ 0.37% 2078 postgres postgres [.] heap_compute_data_size
+ 0.31% 1739 postgres postgres [.] ExecScan
+ 0.27% 1779 postgres [kernel.kallsyms] [k] page_fault
+ 0.24% 1374 postgres postgres [.] LogicalTapeWrite
+ 0.24% 4206 swapper [kernel.kallsyms] [k] intel_idle
+ 0.21% 1206 postgres postgres [.] slot_getsomeattrs
Also, here are our wal settings on the master.
Do these settings appear ok?
wal_block_size | 8192 |
wal_buffers | 16MB |
wal_keep_segments | 5000 |
wal_level | hot_standby |
wal_receiver_status_interval | 10s |
wal_segment_size | 16MB |
wal_sender_delay | 1s |
wal_sync_method | fdatasync |
wal_writer_delay | 200ms |
Thanks!
-----Original Message-----
From: Heikki Linnakangas [mailto:hlinnakangas(at)vmware(dot)com]
Sent: Tuesday, May 06, 2014 4:01 AM
To: Jamie Koceniak
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #9635: Wal sender process is using 100% CPU
On 05/05/2014 10:37 PM, Jamie Koceniak wrote:
> Hi Heikki,
>
> I performed an strace on one of the pids that was consuming 100% of the cpu.
> This is just a portion of the log.
> I also trimmed out some data provided in the quotes in the read and sends, not sure if you need it.
> A majority of the calls in the trace log were recvfrom() and sendto().
> EAGAIN (Resource temporarily unavailable) came up time and time again.
>
> # strace log during 100% cpu utilization on the master
Hmm, that trace looks normal to me. It doesn't look like it's busy-looping, as I would expect if it's using 100% of the CPU.
Can you post the full log, or at least a larger chunk? I don't need the data in the quotes from read() and sends() that you removed, but I would like to see the trace from a longer period of time to see what the pattern is.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
wal_sender_trace.txt | text/plain | 920.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2014-05-06 17:32:35 | Re: BUG #9635: Wal sender process is using 100% CPU |
Previous Message | Alvaro Herrera | 2014-05-06 16:35:17 | Re: Problem with PostgreSQL 9.2.7 and make check on AIX 7.1 |