Re: Active sessions does not terminated due to statement_timeout

From: Ц <pfunk(at)mail(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Active sessions does not terminated due to statement_timeout
Date: 2024-03-27 15:56:52
Message-ID: 1711555012.669414748@f710.i.mail.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


After terminating session file has been updated with new lines.
 
***content from the previous letter***
 
***That one unfinished line. Freeze for ~6 hours.***
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <21328.371590>
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2340453, si_uid=2001} ---
kill(2292604, SIGURG)                   = 0 <0.000056>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000027>
write(2, "\0\0;\1|\373\"\0t2024-03-27 18:43:56 MSK"..., 324) = 324 <0.000027>
rt_sigprocmask(SIG_SETMASK, [URG], NULL, 8) = 0 <0.000018>
close(64)                               = 0 <0.000041>
close(218)                              = 0 <0.000029>
close(220)                              = 0 <0.000027>
close(219)                              = 0 <0.000023>
unlink("base/121442067/t4_182045966")   = 0 <0.000520>
unlink("base/121442067/t4_182045966.1") = -1 ENOENT (No such file or directory) <0.000087>
unlink("base/121442067/t4_182045966_fsm") = -1 ENOENT (No such file or directory) <0.000060>
unlink("base/121442067/t4_182045966_vm") = -1 ENOENT (No such file or directory) <0.000053>
unlink("base/121442067/t4_182045966_init") = -1 ENOENT (No such file or directory) <0.000061>
unlink("base/121442067/t4_182045963")   = 0 <0.000454>
 
***a lot of unlink***
 
unlink("base/121442067/t4_182045892")   = 0 <0.000046>
unlink("base/121442067/t4_182045892.1") = -1 ENOENT (No such file or directory) <0.000027>
unlink("base/121442067/t4_182045892_fsm") = -1 ENOENT (No such file or directory) <0.000023>
unlink("base/121442067/t4_182045892_vm") = -1 ENOENT (No such file or directory) <0.000021>
unlink("base/121442067/t4_182045892_init") = -1 ENOENT (No such file or directory) <0.000027>
close(407)                              = 0 <0.000032>
stat("base/pgsql_tmp/pgsql_tmp2292604.0", {st_mode=S_IFREG|0600, st_size=19141117, ...}) = 0 <0.000024>
unlink("base/pgsql_tmp/pgsql_tmp2292604.0") = 0 <0.010861>
sendto(9, "\24\0\0\0\30\0\0\0\23\17=\7]U\0\0\375\21$\1\0\0\0\0", 24, 0, NULL, 0) = 24 <0.000067>
write(2, "\0\0\265\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 190) = 190 <0.000030>
brk(0x555d798fa000)                     = 0x555d798fa000 <0.002482>
brk(0x555d78902000)                     = 0x555d78902000 <0.003686>
brk(0x555d76f75000)                     = 0x555d76f75000 <0.008538>
openat(AT_FDCWD, "global/pg_filenode.map", O_RDONLY) = 64 <0.000043>
read(64, "\27'Y\0+\0\0\0\356\4\0\0r\316Y\10\224\v\0\0y\316Y\10\275\4\0\0\177\316Y\10"..., 512) = 512 <0.000022>
close(64)                               = 0 <0.000019>
openat(AT_FDCWD, "base/121442067/pg_filenode.map", O_RDONLY) = 64 <0.000027>
read(64, "\27'Y\0\21\0\0\0\353\4\0\0\3\26?\10\341\4\0\0\334\32?\10\347\4\0\0\347\4\0\0"..., 512) = 512 <0.000018>
close(64)                               = 0 <0.000018>
close(20)                               = 0 <0.000019>
 
***a lot of close***
 
close(138)                              = 0 <0.000019>
openat(AT_FDCWD, "base/121442067/138352131", O_RDWR) = 5 <0.000052>
lseek(5, 0, SEEK_END)                   = 149766144 <0.000018>
openat(AT_FDCWD, "base/121442067/2601", O_RDWR) = 6 <0.000029>
lseek(6, 0, SEEK_END)                   = 8192 <0.000017>
kill(21, SIGURG)                        = 0 <0.000035>
sendto(9, "\31\0\0\0\20\0\0\0\23\17=\7\4\0\0\0", 16, 0, NULL, 0) = 16 <0.000078>
sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\3\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.000036>
sendto(9, "\2\0\0\0\300\3\0\0\23\17=\7\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 960, 0, NULL, 0) = 960 <0.000041>
 
***a lot of sendto***
 
sendto(9, "\20\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0\0\0\0\0\0"..., 72, 0, NULL, 0) = 72 <0.000030>
write(2, "\0\0\322\0|\373\"\0t2024-03-27 18:43:56 MSK"..., 219) = 219 <0.000050>
futex(0x7fd5aacbf6c4, FUTEX_WAKE_PRIVATE, 2147483647) = 0 <0.000020>
exit_group(1)                           = ?
+++ exited with 1 +++
 
PS: I guess "No such file or directory" messages due to execution of strace on host, and postgres process is executing inside docker.
>Среда, 27 марта 2024, 18:39 +03:00 от Ц <pfunk(at)mail(dot)ru>:

>I was able to connect to this process using

>strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o strace_of_active_session -ff
>(the -T flag gives us the time the system call has been running at the end of each line).

>If the session had been hanging active for some time, there was only one unfinished line in strace:
>epoll_wait(4,


>But if we got it a bit earlier, we got this output:

>***something before process was catched***

>epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 <0.003515>
>sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 1968, 0, NULL, 0) = 1968 <0.000068>
>sendto(10, " \320\277\320\276\320\273 \321\204\320\265\320\262\321\200\320\260\320\273\321\214 \320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0.000051>
>pread64(407, "y\221\275\0\0\0\0\0\0\342\275[\0\325\320\244\320\232\320\243 \320\220\320\264\320\274\320\270\320\275\320"..., 8192, 10354688) = 8192 <0.000065>
>sendto(10, "\377\377\0\0\0\270(907.0702.1340380180.111.2"..., 8192, 0, NULL, 0) = 8192 <0.000066>
>pread64(407, "00057476 \320\276\321\202 31.01.2024 \320\262 \321\202.\321"..., 8192, 10362880) = 8192 <0.000044>
>sendto(10, "7233430 \320\225\320\264\320\270\320\275\321\213\320\271 \320\275\320\260\320\273\320\276\320\263\320"..., 8192, 0, NULL, 0) = 8192 <0.000073>

>***a lot of pread64, sendto***

>sendto(10, "010006300\377\377\377\377\0\0\0X\320\241\320\274\320\276\320\273\320\265\320\275\321\201\320"..., 8192, 0, NULL, 0) = 8192 <0.000023>
>pread64(407, "901001\10\2\0\0\30\0\3\0 y\221\235\0\0\0\0\0\0\34\202[\0\20\2\0\0"..., 8192, 16949248) = 8192 <0.000020>
>sendto(10, "\321\200\320\265\320\265\321\201\321\202\321\200\321\203 \342\204\226 24119 \320\276\321\202 19"..., 8192, 0, NULL, 0) = 8192 <0.000024>
>sendto(10, ") \320\273/\321\20120907234120  \320\221\320\23690741220"..., 8192, 0, NULL, 0) = 8192 <0.000022>
>pread64(407, "\320\275\321\201\320\272\320\260 (\320\243\320\277\321\200\320\260\320\262\320\273\320\265\320\275\320\270\320\265 \320"..., 8192, 16957440) = 8192 <0.000019>
>sendto(10, "632\0\0\0\n7707083893\0\0\0\0\0\0\0\t6730020"..., 8192, 0, NULL, 0) = 8192 <0.000022>
>pread64(407, "\1\0\0\30\0\3\0 y\221\275\0\0\0\0\0\0\356\367[\0\375\320\244\320\232\320\243 \320\220\320"..., 8192, 16965632) = 8192 <0.000020>
>sendto(10, "\0\0\0\02403234643667010006300\377\377\377\377\0\0\0d"..., 8192, 0, NULL, 0) = 6632 <0.000026>
>sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
>epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <2.855325>
>--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=2292607, si_uid=2001} ---
>kill(2292604, SIGURG)                   = 0 <0.000023>
>rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000052>
>sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = 1560 <0.000033>
>pread64(407, "\320\265\321\202.\23\1\202\4\0J\6\210\23\025044525411\0276732208"..., 8192, 16973824) = 8192 <0.000026>
>sendto(10, "2.24,4085 \320\276\321\202 21.02.24 \320\272-\321\202 01"..., 8192, 0, NULL, 0) = 8192 <0.000025>
>pread64(407, "\276\320\263. 59003491 \320\276\321\202 16.11.2007 \320\235"..., 8192, 16982016) = 8192 <0.000020>

>***a lot of pread64, sendto***

>pread64(407, "\260 (\320\234\320\221\320\236\320\243 \"\320\241\320\250 \342\204\226 11\")+40702"..., 8192, 17375232) = 8192 <0.000021>
>sendto(10, "\274\320\276\320\273\320\265\320\275\321\201\320\272\320\276\320\265 \320\276\321\202\320\264\320\265\320\273\320\265\320\275"..., 8192, 0, NULL, 0) = 4000 <0.000025>
>sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
>epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000018>
>read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000020>
>epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <4.214523>
>--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
>kill(2292604, SIGURG)                   = 0 <0.000028>
>setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1790, tv_usec=905739}}, NULL) = 0 <0.000020>
>rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000018>
>sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000023>
>epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000017>
>read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000021>

>***First hang in strace output for  <1790.905479> seconds***
>***It was looking like last line in this file***
>***epoll_wait(4,  ***

>epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <1790.905479>
>--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
>kill(2292604, SIGURG)                   = 0 <0.000035>
>kill(-2292604, SIGINT)                  = 0 <0.000020>
>--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
>rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000018>
>kill(2292604, SIGINT)                   = 0 <0.000017>
>--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
>rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000016>
>rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000016>
>sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000025>
>epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000020>
>read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000018>
>epoll_wait(4,

>***last line looks incomplete and it hangs forever***



>When I first noticed that the file was not growing, the last line was the same incomplete one
>epoll_wait(4 ,
>and I thought that nothing would change.

>But later I noticed that the file had changed.
>The line was formed in:
>epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <1790.905479>
>telling us that epoll_wait had been hanging for 30 minutes.

>An there were added 13 more lines with the same ending epoll_wait(4,
>...but it was already final.

>Here are the values of the possibly involved parameters:

>                name                 | setting | unit
>-------------------------------------+---------+------
> deadlock_timeout                    | 1000    | ms
> idle_in_transaction_session_timeout | 30000   | ms
> idle_session_timeout                | 60000   | ms
> statement_timeout                   | 1800000 | ms
> tcp_keepalives_count                | 4       |
> tcp_keepalives_idle                 | 30      | s
> tcp_keepalives_interval             | 5       | s
> tcp_user_timeout                    | 0       | ms


>PS: found an issue in node-postgres with similar symptoms but can't figure out how their solution can be applied in my case.
>https://github.com/brianc/node-postgres/issues/1952
>   
>>Вторник, 26 марта 2024, 17:19 +03:00 от Ц < pfunk(at)mail(dot)ru >:
>> 
>>Greetings!
>>I’ve faced with strange behavior when I see a lot of active sessions started hours ago while statement_timeout = '30min'.
>>All of them are fetching from cursors.
>> 
>>Typical session looks like:
>>backend_start    | 2024-03-26 14:34:20.552594+03
>>xact_start           | 2024-03-26 14:34:54.974628+03
>>query_start         | 2024-03-26 14:35:02.024133+03
>>state_change     | 2024-03-26 14:35:02.024134+03
>>wait_event_type | Client
>>wait_event          | ClientWrite
>>state                   | active
>>backend_xid       | 23240392
>>backend_xmin    | 23226474
>>query                   | fetch all from "<unnamed portal 20>"
>>backend_type     | client backend
>> 
>> 
>>They are accumulating up to tens by the end of the day with all negative impacts on performance.
>>Initially I thought that clients already died but due to network issues database considers them to be alive. So I set tcp_keepalive GUCs to nonzero values. Without success.
>>Then I checked connections from the app server side and found them in ESTABLISHED state.
>>It's certainly an application fault and it should not hold cursor forever...but
>> 
>>Is the any GUC parameters to fight with such «clients»?
>> 
>> 




 
 
 
 

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-03-27 16:01:34 Re: Query on Postgres SQL transaction
Previous Message Ц 2024-03-27 15:39:22 Re: Active sessions does not terminated due to statement_timeout