| From: | Mark Simonetti <marks(at)opalsoftware(dot)co(dot)uk> | 
|---|---|
| To: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> | 
| Subject: | Hang on NOTIFY | 
| Date: | 2015-08-07 11:32:38 | 
| Message-ID: | 55C49756.70505@opalsoftware.co.uk | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
The system I am developing makes extensive use of the async 
NOTIFY/LISTEN system.
I am currently experiencing a problem on 2 production servers:
Server 1:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.3.5
Server 2:
Virtual Windows Server 2008 R2 (VMWare)
PostgreSQL 9.4.2
After the system has been running for a period of time, sometimes a few 
days sometimes a few weeks, any calls to NOTIFY
will hang.
After in depth investigation it appears to happen when a listening 
backend has been connected for some time (days).
Any other backend trying to inform that backend will hang on 
"CallNamedPipe" in pgkill (kill.c).
Here is a stack trace from the hung SENDING backend, main thread : -
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
      kernel32(dot)dll!_CallNamedPipeW(at)28()  + 0xf4 bytes
      postgres.exe!pgkill(int pid, int sig)  Line 43 + 0x2b bytes  C
      postgres.exe!SendProcSignal(int pid, ProcSignalReason reason, int 
backendId)  Line 198 + 0x10 bytes    C
      postgres.exe!SignalBackends()  Line 1497 + 0xe bytes    C
 >    postgres.exe!ProcessCompletedNotifies()  Line 1092    C
      postgres.exe!PostgresMain(int argc, char * * argv, const char * 
dbname, const char * username)  Line 3947    C
      postgres.exe!BackendRun(Port * port)  Line 4011 + 0x21 bytes  C
      postgres.exe!SubPostmasterMain(int argc, char * * argv)  Line 4515 
+ 0x8 bytes    C
      postgres.exe!main(int argc, char * * argv)  Line 203 + 0x7 bytes    C
      postgres.exe!__tmainCRTStartup()  Line 555 + 0x17 bytes    C
      kernel32(dot)dll!(at)BaseThreadInitThunk@12()  + 0x12 bytes
      ntdll(dot)dll!___RtlUserThreadStart(at)8()  + 0x27 bytes
      ntdll(dot)dll!__RtlUserThreadStart(at)8()  + 0x1b bytes
Here is a stack trace from the signalling thread (I know its irrelevent 
as this is for incomming signals) : -
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
 >    postgres.exe!pg_signal_thread(void * param)  Line 279 + 0x9 bytes    C
Now for the RECIPIENT backend : -
      ntdll(dot)dll!_ZwWaitForMultipleObjects(at)20()  + 0x15 bytes
      ntdll(dot)dll!_ZwWaitForMultipleObjects(at)20()  + 0x15 bytes
      KERNELBASE(dot)dll!_WaitForMultipleObjectsEx(at)20()  + 0x36 bytes
      kernel32(dot)dll!_WaitForMultipleObjectsExImplementation(at)20()  + 0x8e 
bytes
 >     postgres.exe!pgwin32_waitforsinglesocket(unsigned int s, int 
what, int timeout)  Line 216 + 0x14 bytes    C
      postgres.exe!pgwin32_recv(unsigned int s, char * buf, int len, int 
f)  Line 352 + 0xa bytes    C
      postgres.exe!secure_read(Port * port, void * ptr, unsigned int 
len)  Line 304 + 0x12 bytes    C
      postgres.exe!pq_getbyte()  Line 895 + 0x67 bytes    C
      postgres.exe!SocketBackend(StringInfoData * inBuf)  Line 344 + 0x5 
bytes    C
     postgres.exe!PostgresMain(int argc, char * * argv, const char * 
dbname, const char * username)  Line 3968 + 0x1c bytes    C
      postgres.exe!BackendRun(Port * port)  Line 4011 + 0x21 bytes  C
      postgres.exe!SubPostmasterMain(int argc, char * * argv)  Line 4515 
+ 0x8 bytes    C
      postgres.exe!main(int argc, char * * argv)  Line 203 + 0x7 bytes    C
      postgres.exe!__tmainCRTStartup()  Line 555 + 0x17 bytes    C
      kernel32(dot)dll!(at)BaseThreadInitThunk@12()  + 0x12 bytes
      ntdll(dot)dll!___RtlUserThreadStart(at)8()  + 0x27 bytes
      ntdll(dot)dll!__RtlUserThreadStart(at)8()  + 0x1b bytes
This is the usual place for it to wait, so this seems okay.
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
      ntdll(dot)dll!_NtFsControlFile(at)40()  + 0x15 bytes
 >    postgres.exe!pg_signal_thread(void * param)  Line 279 + 0x9 bytes    C
Also looks fine.
This seems like a possible Windows bug, as the call to CallNamedPipe has 
a timeout of 1000 milliseconds, but it is clearly not timing out.  It 
only seems to exit if I exit the backend it is trying to signal.
NOTE: it is trying to send to many backends, but on all the stuck 
backends I checked, they all were stuck sending to the same recipient.  
Closing that particular recipient DOES free everything up and signals 
start flowing again.
I've searched around and cannot find a similar bug report.  Is it 
possibly something I'm doing wrong?
Thanks,
Mark.
--
| From | Date | Subject | |
|---|---|---|---|
| Next Message | beijing_pg | 2015-08-07 12:16:43 | BUG #13541: There is a visibility issue when run some DDL and Query. The time window is very shot | 
| Previous Message | Bruce Momjian | 2015-08-06 16:24:28 | Re: BUG #13540: upsert is not good |