From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | James Sewell <james(dot)sewell(at)jirotech(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Max connections reached without max connections reached |
Date: | 2021-12-02 04:05:36 |
Message-ID: | CAFiTN-t8JS63URu7CcyJ5ckd9RPyou7x73F7f29Uo2fPCUULnw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Dec 2, 2021 at 3:59 AM James Sewell <james(dot)sewell(at)jirotech(dot)com> wrote:
>
>
>> Looking again into the back trace[1], it appeared that the backend is
>> getting stuck while getting the subtransaction's parent information,
>> and that can only happen if the snapshot is getting marked as
>> overflowed. So it seems that some of the scripts are using a lot of
>> sub-transaction (> 64) within a transaction and in such case, we can
>> not cache the sub-transaction information and the snapshot will be
>> marked overflowed. After that, for checking the visibility of any
>> tuple we need to consult the SLRU to get TopMost transaction
>> information and if the subtransaction SLRU cache is full then it will
>> have to do the I/O. So in this case it appeared that a lot of
>> parallel scans/new connections are trying to read the pg_subtrans and
>> due to limited cache size that is not even able to load the data in
>> the cache and getting frequently blocked. So I assume that when you
>> say the startup is getting stuck forever, how long did you wait, maybe
>> it is moving but getting stuck again for the next tuple, so to confirm
>> that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
>> and check whether it is hitting that breakpoint again or not. Also,
>> check whether you are seeing "SubtransControlLock" in
>> pg_stat_activity. There is one nice blog[2] explaining this issue so
>> you can refer to that as well.
>
>
> Ah. Snap. This makes a lot of sense. The developers have a nasty habit of using LOOP over rows which has never been stamped out. I bet they have applied this to something with a lot of rows.
>
> Is there a way of seeing which statements / backends / processes have greater than 64 stxns? I'd like to monitor this.
I think there is no such view or anything which tells about which
backend or transaction has more than 64 sub transaction. But if we
are ready to modify the code then we can LOG that information in
GetNewTransactionId(), when first time we are marking it overflown.
if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
{
MyProc->subxids.xids[nxids] = xid;
pg_write_barrier();
MyProc->subxidStatus.count = substat->count = nxids + 1;
}
else
{
MyProc->subxidStatus.overflowed = substat->overflowed = true;
<-- we can log or put breakpoint here and identify which statement is
creating oeverflow-->
}
IMHO, it is good to LOG such information if we are not already logging
this anywhere.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Lewis | 2021-12-02 04:07:11 | Re: Max connections reached without max connections reached |
Previous Message | Rob Sargent | 2021-12-02 03:54:20 | Re: Max connections reached without max connections reached |