From: | Jurrie Overgoor <postgresql-mailinglist(at)jurr(dot)org> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: How to debug a connection that's "active" but hanging? |
Date: | 2021-07-12 20:56:53 |
Message-ID: | 1b783c37-0712-8b21-f1e3-3ac1f8c4f096@jurr.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 12-07-2021 20:56, Vijaykumar Jain wrote:
> On Mon, 12 Jul 2021 at 23:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
>
>
> The backtraces you captured look like the query is not "hung", it's
> just computing away.
>
>
> He mentioned earlier that the query was hung as 'active' for 8 hours
> and on.
>
> incase this is due to bad plan,
> @Jurrie Overgoor <mailto:postgresql-mailinglist(at)jurr(dot)org> is it also
> possible for you to run manually
>
> `vacuumdb -a -v` from the terminal, each time before you run your
> test suite for some runs, do you still get the same issue?
>
> I have a feeling repeated runs may have caused a lot of bloat on some
> tables which might have not been reclaimed by autovacuum runs.
I configured Jenkins to run that command prior to executing the tests. I
got 5 successful runs, no hanging queries. Then I reverted and ran
again. The first and second run were ok; the third run hung again. So
your hunch might be right.
On 12-07-2021 19:46, Tom Lane wrote:
> You might need
> to investigate by altering your application to capture "EXPLAIN ..."
> output just before the troublesome query, so you can see if it gets
> a different plan in the slow cases.
Then I tried this. The query plans are indeed not consistent.
Most of the time the first line of the query plan is: Unique
(cost=4892.35..4892.35 rows=1 width=64) [1]
I have seen other costs: 5818.30, 6350.85 and 6514.73. They all complete
correctly. [2], [3], [4]
The plan that leaves the query hanging in the 'active' state starts
with: Unique (cost=241.81..241.82 rows=1 width=64) [5]
That's clearly much lower than the rest. So I suspect the planner making
a 'wrong' guess there, causing a bad plan, and a long time to execute.
For reference, the executed query is [6].
Now, where to go from here? Is this considered a bug in PostgreSQL, or
am I misusing the database engine by doing DROP DATABASE and CREATE
DATABASE over and over again? I must say that I never saw this behavior
on PostgreSQL 9.6, so in that regard it might be considered a bug.....?
What can I do to get to the bottom of this? Should I export the content
of some metadata tables prior to executing the hanging query? Should I
`vacuumdb -a -v` prior to logging the EXPLAIN for the hanging query?
With kind regards,
Jurrie
[1] https://jurr.org/PostgreSQL_13_hanging_query/normal.txt
[2] https://jurr.org/PostgreSQL_13_hanging_query/alt1.txt
[3] https://jurr.org/PostgreSQL_13_hanging_query/alt3.txt
[4] https://jurr.org/PostgreSQL_13_hanging_query/alt2.txt
[5] https://jurr.org/PostgreSQL_13_hanging_query/hang.txt
[6] https://jurr.org/PostgreSQL_13_hanging_query/query.txt
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-07-12 21:21:07 | Re: How to debug a connection that's "active" but hanging? |
Previous Message | Vijaykumar Jain | 2021-07-12 18:56:15 | Re: How to debug a connection that's "active" but hanging? |