On 12/1/21 3:29 PM, James Sewell wrote:
>
> Looking again into the back trace[1], it appeared that the backend is
> getting stuck while getting the subtransaction's parent information,
> and that can only happen if the snapshot is getting marked as
> overflowed. So it seems that some of the scripts are using a lot of
> sub-transaction (> 64) within a transaction and in such case, we can
> not cache the sub-transaction information and the snapshot will be
> marked overflowed. After that, for checking the visibility of any
> tuple we need to consult the SLRU to get TopMost transaction
> information and if the subtransaction SLRU cache is full then it will
> have to do the I/O. So in this case it appeared that a lot of
> parallel scans/new connections are trying to read the pg_subtrans and
> due to limited cache size that is not even able to load the data in
> the cache and getting frequently blocked. So I assume that when you
> say the startup is getting stuck forever, how long did you wait, maybe
> it is moving but getting stuck again for the next tuple, so to confirm
> that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
> and check whether it is hitting that breakpoint again or not. Also,
> check whether you are seeing "SubtransControlLock" in
> pg_stat_activity. There is one nice blog[2] explaining this issue so
> you can refer to that as well.
>
>
> Ah. Snap. This makes a lot of sense. The developers have a nasty habit
> of using LOOP over rows which has never been stamped out. I bet they
> have applied this to something with a lot of rows.
>
> Is there a way of seeing which statements / backends / processes have
> greater than 64 stxns? I'd like to monitor this.
>
> - James
>
> ------------------------------------------------------------------------
> The contents of this email are confidential and may be subject to
> legal or professional privilege and copyright. No representation is
> made that this email is free of viruses or other defects. If you have
> received this communication in error, you may not copy or distribute
> any part of it or otherwise disclose its contents to anyone. Please
> advise the sender of your incorrect receipt of this correspondence.
Also be on the look-out for begin trans; begin trans; etc I read
Kumar's report as nested transactions. If your gang is doing a
transaction per row, they need a kick in the rear. Anyone not closing
those needs a pink slip.