More information found. After the hang connection appears, I noticed
there were several hundreds of connections of the same user. Since I use
pgbouncer and I only set the pool size to 50 for each user, this is very
strange. I checked the pgbouncer side, 'show pools' showed the active
server connection count is less than 50(only 35 actually). I also
checked the client port which is shown in pg process list. It is not
used at pgbouncer side when I did the check. So I stopped pgbouncer then
the connection count from the user drops slowly. Finally all those
connections disappeared. After that I restarted pgbouncer and it looks
good again.
With this solution, I at least don't have to kill pg when the problem
happens. But anyone has a clue why this happens? What I need to check
for the root cause? One thing I forgot to check is the network status of
those orphan connections at pg side. I will check it next time and see
if they are in abnormal status.