From: | PG Bug reporting form <noreply(at)postgresql(dot)org> |
---|---|
To: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Cc: | juanjo(dot)santamaria(at)gmail(dot)com |
Subject: | BUG #15438: Standby corruption after "Too many open files in system" error |
Date: | 2018-10-18 18:36:42 |
Message-ID: | 15438-bbe4679a595d795f@postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 15438
Logged by: Juanjo Santamaría
Email address: juanjo(dot)santamaria(at)gmail(dot)com
PostgreSQL version: 10.4
Operating system: Redhat 7.3
Description:
After some time working under poor conditions, the open file limit was too
low for the postgres user, the streaming replica ended in a corrupted
state.
The primary was working the whole time and recreating the standby replica,
after configuring the user limits, seems to solve the issue.
The log is quite verbose, so I have tried to summarize at least one
occurrency for every error:
Oct 15 10:47:54 postgres[9507]: [9-1] LOG: started streaming WAL from
primary at 6/BE000000 on timeline 3
Oct 15 10:48:38 postgres[9507]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open files in system
Oct 15 10:48:38 postgres[40098]: [15-1] LOG: record with incorrect
prev-link 6/A1A993C8 at 6/BEA99400
Oct 15 10:48:38 postgres[9550]: [9-1] FATAL: could not connect to the
primary server: server closed the connection unexpectedly
Oct 15 10:48:38 postgres[9550]: [9-2] #011#011This probably means the server
terminated abnormally
Oct 15 10:48:38 postgres[9550]: [9-3] #011#011before or while processing the
request.
Oct 15 10:48:53 postgres[9586]: [9-1] LOG: started streaming WAL from
primary at 6/BE000000 on timeline 3
Oct 15 10:49:32 postgres[9586]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open f
iles in system
Oct 15 10:49:32 postgres[40098]: [16-1] LOG: invalid record length at
6/BEABD0F8: wanted 24, got 0
Oct 15 10:50:31 postgres[9765]: [9-1] LOG: started streaming WAL from
primary at 6/BE000000 on timeline 3
Oct 15 10:50:39 postgres[9765]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open files in system
Oct 15 10:50:39 postgres[40098]: [19-1] LOG: invalid resource manager ID
101 at 6/BEAE3708
Oct 15 10:51:01 postgres[9767]: [9-1] FATAL: could not connect to the
primary server: server closed the connection unexpectedly
Oct 15 10:51:01 postgres[9767]: [9-2] #011#011This probably means the server
terminated abnormally
Oct 15 10:51:01 postgres[9767]: [9-3] #011#011before or while processing the
request.
Oct 15 10:51:06 postgres[9783]: [9-1] LOG: started streaming WAL from
primary at 6/BE000000 on timeline 3
Oct 15 10:51:39 postgres[9783]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open files in system
Oct 15 10:51:39 postgres[40098]: [20-1] LOG: record with incorrect
prev-link 3E/746F6F72 at 6/BEC3D5E8
Oct 15 11:03:39 postgres[10850]: [9-1] FATAL: could not connect to the
primary server: FATAL: could not open relation mapping file
"global/pg_filenode.map": Too many open files in system
Oct 15 12:00:10 postgres[40101]: [3732-1] LOG: restartpoint complete: wrote
61 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 re
cycled; write=6.159 s, sync=0.018 s, total=6.223 s; sync files=16,
longest=0.006 s, average=0.001 s; distance=451 kB, estimate=2215 kB
Oct 15 12:01:40 postgres[15182]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open
files in system
Oct 15 12:01:40 postgres[40098]: [80-1] LOG: invalid resource manager ID 64
at 6/BEE34F00
Oct 15 12:01:42 postgres[15711]: [9-1] LOG: started streaming WAL from
primary at 6/BE000000 on timeline 3
Oct 15 12:05:38 postgres[15711]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: epoll_create1 failed: Too many open files in system
Oct 15 12:05:38 postgres[40098]: [81-1] LOG: unexpected pageaddr 6/A1E6E000
in log segment 0000000300000006000000BE, offset 15130624
Oct 16 01:34:15 postgres[18890]: [9-1] LOG: started streaming WAL from
primary at 6/C3000000 on timeline 3
Oct 16 01:34:15 postgres[18890]: [10-1] FATAL: could not receive data from
WAL stream: ERROR: could not open file "pg_wal/0000000300000006000000C3":
Too many open files in system
Oct 16 17:58:44 postgres[38030]: [9-1] FATAL: the database system is
starting up
Oct 16 17:58:47 postgres[38029]: [9-1] LOG: entering standby mode
Oct 16 17:58:47 postgres[38029]: [10-1] LOG: redo starts at 6/CF9B3598
Oct 16 17:58:47 postgres[38029]: [11-1] LOG: consistent recovery state
reached at 6/CF9C53B8
Oct 16 17:58:47 postgres[38026]: [8-1] LOG: database system is ready to
accept read only connections
Oct 16 17:58:48 postgres[38029]: [12-1] FATAL: invalid memory alloc request
size 1715044352
Oct 16 17:58:48 postgres[38026]: [9-1] LOG: startup process (PID 38029)
exited with exit code 1
Oct 16 17:58:48 postgres[38026]: [10-1] LOG: terminating any other active
server processes
Oct 16 17:58:48 postgres[38026]: [11-1] LOG: database system is shut down
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-10-18 22:39:34 | Re: NOTIFY does not work as expected |
Previous Message | Euler Taveira | 2018-10-18 16:52:49 | Re: parallel queries hang |