Re: How to debug a connection that's "active" but hanging?

From: Jurrie Overgoor <postgresql-mailinglist(at)jurr(dot)org>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: How to debug a connection that's "active" but hanging?
Date: 2021-07-12 20:56:53
Message-ID: 1b783c37-0712-8b21-f1e3-3ac1f8c4f096@jurr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12-07-2021 20:56, Vijaykumar Jain wrote:
> On Mon, 12 Jul 2021 at 23:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
>
> The backtraces you captured look like the query is not "hung", it's
> just computing away.
>
>
> He mentioned earlier that the query was hung as 'active' for 8 hours
> and on.
>
> incase this is due to bad plan,
> @Jurrie Overgoor <mailto:postgresql-mailinglist(at)jurr(dot)org>  is it also
> possible for you to run manually
>
> `vacuumdb  -a -v`  from the terminal, each time before you run your
> test suite for some runs, do you still get the same issue?
>
> I have a feeling repeated runs may have caused a lot of bloat on some
> tables which might have not been reclaimed by autovacuum runs.

I configured Jenkins to run that command prior to executing the tests. I
got 5 successful runs, no hanging queries. Then I reverted and ran
again. The first and second run were ok; the third run hung again. So
your hunch might be right.

On 12-07-2021 19:46, Tom Lane wrote:
> You might need
> to investigate by altering your application to capture "EXPLAIN ..."
> output just before the troublesome query, so you can see if it gets
> a different plan in the slow cases.

Then I tried this. The query plans are indeed not consistent.

Most of the time the first line of the query plan is: Unique 
(cost=4892.35..4892.35 rows=1 width=64) [1]

I have seen other costs: 5818.30, 6350.85 and 6514.73. They all complete
correctly. [2], [3], [4]

The plan that leaves the query hanging in the 'active' state starts
with: Unique  (cost=241.81..241.82 rows=1 width=64) [5]

That's clearly much lower than the rest. So I suspect the planner making
a 'wrong' guess there, causing a bad plan, and a long time to execute.
For reference, the executed query is [6].

Now, where to go from here? Is this considered a bug in PostgreSQL, or
am I misusing the database engine by doing DROP DATABASE and CREATE
DATABASE over and over again? I must say that I never saw this behavior
on PostgreSQL 9.6, so in that regard it might be considered a bug.....?

What can I do to get to the bottom of this? Should I export the content
of some metadata tables prior to executing the hanging query? Should I
`vacuumdb -a -v` prior to logging the EXPLAIN for the hanging query?

With kind regards,

Jurrie

[1] https://jurr.org/PostgreSQL_13_hanging_query/normal.txt
[2] https://jurr.org/PostgreSQL_13_hanging_query/alt1.txt
[3] https://jurr.org/PostgreSQL_13_hanging_query/alt3.txt
[4] https://jurr.org/PostgreSQL_13_hanging_query/alt2.txt
[5] https://jurr.org/PostgreSQL_13_hanging_query/hang.txt
[6] https://jurr.org/PostgreSQL_13_hanging_query/query.txt

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-07-12 21:21:07 Re: How to debug a connection that's "active" but hanging?
Previous Message Vijaykumar Jain 2021-07-12 18:56:15 Re: How to debug a connection that's "active" but hanging?