Re: Max connections reached without max connections reached

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: James Sewell <james(dot)sewell(at)jirotech(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Max connections reached without max connections reached
Date: 2021-12-01 08:50:38
Message-ID: CAFiTN-uANPvW5JoQXkt3ep5Jappu3ymuYeqs7yuqJhdkv5Mf5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 30, 2021 at 5:16 AM James Sewell <james(dot)sewell(at)jirotech(dot)com> wrote:
>>
>> How did you verify that, maybe some process started IO and stuck
>> there? Can we check pg_stat_activity that is there some process that
>> shows in the wait event as SLRURead/SLRUWrite and not coming out of
>> that state?

Looking again into the back trace[1], it appeared that the backend is
getting stuck while getting the subtransaction's parent information,
and that can only happen if the snapshot is getting marked as
overflowed. So it seems that some of the scripts are using a lot of
sub-transaction (> 64) within a transaction and in such case, we can
not cache the sub-transaction information and the snapshot will be
marked overflowed. After that, for checking the visibility of any
tuple we need to consult the SLRU to get TopMost transaction
information and if the subtransaction SLRU cache is full then it will
have to do the I/O. So in this case it appeared that a lot of
parallel scans/new connections are trying to read the pg_subtrans and
due to limited cache size that is not even able to load the data in
the cache and getting frequently blocked. So I assume that when you
say the startup is getting stuck forever, how long did you wait, maybe
it is moving but getting stuck again for the next tuple, so to confirm
that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
and check whether it is hitting that breakpoint again or not. Also,
check whether you are seeing "SubtransControlLock" in
pg_stat_activity. There is one nice blog[2] explaining this issue so
you can refer to that as well.

[1]
21 #2 0x00007f18fb9dec6b in sem_wait@@GLIBC_2.2.5 () from
/lib64/libpthread.so.0
20 #3 0x000000000071c482 in PGSemaphoreLock ()
19 #4 0x000000000079078c in LWLockAcquire ()
18 #5 0x00000000004fa9ae in SimpleLruWaitIO.isra.1 ()
17 #6 0x00000000004fb2a4 in SimpleLruReadPage ()
16 #7 0x00000000004fbc07 in SubTransGetParent ()
15 #8 0x00000000004fbc57 in SubTransGetTopmostTransaction ()
14 #9 0x00000000008f65a7 in XidInMVCCSnapshot ()
13 #10 0x00000000008f662e in HeapTupleSatisfiesMVCC ()
12 #11 0x00000000004c436e in heapgetpage ()

[2]https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yi Sun 2021-12-01 08:56:11 Re: ssl_crl_file Certificate Revocation List doesn't work for postgresql 11
Previous Message Frank Limpert 2021-12-01 07:50:56 Re: case insensitive collation of Greek's sigma