BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: rleach(at)princeton(dot)edu
Subject: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10
Date: 2023-11-01 23:02:21
Message-ID: 18177-a282c2eaaf791f21@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18177
Logged by: Robert Leach
Email address: rleach(at)princeton(dot)edu
PostgreSQL version: 13.12
Operating system: macOS and linux
Description:

We upgraded our Django on a large project from 3.2 to 4.2 (and postgres from
10 to 13) and found that our test suite which typically took about 25-30
minutes roughly doubled in running time (50 to 75 minutes). Switching back
to Django 3.2 but keeping postgres 13 also experiences the same slow-down.
I isolated one test that normally took 90s that now takes consistently 400s
and further narrowed it down to a single (simple) query. That test would
run fast if run by itself, but it run after a series of other tests, would
take 400s.

I extracted the SQL generated for the query and performed an `explain` on
it, and it showed that the actual time took way longer than the estimated
cost. Doing a vacuum full analyze was able to restore the speed, but the
only place where I was able to use it to get the speed back, relatively
caused weird errors in other tests.

I cannot claim to understand the bug that is causing this issue, so the best
I can do is simply provide the explain output and try to keep from providing
confusing details, because this is outside the realm of my expertise:

```
('Aggregate (cost=45.77..45.78 rows=1 width=8) (actual
time=202326.210..202326.214 rows=1 loops=1)',)
(' -> Unique (cost=0.88..45.76 rows=1 width=1052) (actual
time=10492.963..202326.205 rows=1 loops=1)',)
(' -> Nested Loop (cost=0.88..45.75 rows=1 width=1052) (actual
time=10492.959..202326.188 rows=8 loops=1)',)
(' Join Filter: ("DataRepo_peakdata".id =
"DataRepo_peakdatalabel".peak_data_id)',)
(' Rows Removed by Join Filter: 3076',)
(' -> Nested Loop (cost=0.62..37.47 rows=1 width=8) (actual
time=10492.455..202323.935 rows=8 loops=1)',)
(' Join Filter: ("DataRepo_peakgroup".id =
"DataRepo_peakdata".peak_group_id)',)
(' Rows Removed by Join Filter: 7888',)
(' -> Index Scan using
"DataRepo_peakdata_peak_group_id_4dd87f4a" on "DataRepo_peakdata"
(cost=0.25..8.26 rows=1 width=8) (actual time=0.017..7.149 rows=7896
loops=1)',)
(' -> Nested Loop (cost=0.38..29.20 rows=1 width=4)
(actual time=2.942..25.621 rows=1 loops=7896)',)
(' Join Filter:
("DataRepo_peakgroup".peak_annotation_file_id =
"DataRepo_archivefile".id)',)
(' -> Nested Loop (cost=0.38..18.43 rows=1
width=8) (actual time=2.935..25.614 rows=1 loops=7896)',)
(' Join Filter: ("DataRepo_msrun".sample_id =
"DataRepo_sample".id)',)
(' Rows Removed by Join Filter: 83',)
(' -> Nested Loop (cost=0.12..10.15 rows=1
width=12) (actual time=0.028..2.513 rows=84 loops=7896)',)
(' Join Filter:
("DataRepo_peakgroup".msrun_id = "DataRepo_msrun".id)',)
(' Rows Removed by Join Filter:
3486',)
(' -> Index Scan using
"DataRepo_peakgroup_peak_annotation_file_id_6dc2fc25" on
"DataRepo_peakgroup" (cost=0.12..8.14 rows=1 width=12) (actual
time=0.006..0.464 rows=84 loops=7896)',)
(" Filter: ((name)::text =
'serine'::text)",)
(' Rows Removed by Filter:
1008',)
(' -> Seq Scan on "DataRepo_msrun"
(cost=0.00..2.00 rows=1 width=8) (actual time=0.013..0.017 rows=42
loops=663264)',)
(' -> Index Scan using
"DataRepo_sample_pkey" on "DataRepo_sample" (cost=0.25..8.27 rows=1
width=4) (actual time=0.194..0.275 rows=1 loops=663264)',)
(" Filter: ((name)::text =
'xzl5_panc'::text)",)
(' Rows Removed by Filter: 281',)
(' -> Seq Scan on "DataRepo_archivefile"
(cost=0.00..10.75 rows=1 width=4) (actual time=0.002..0.002 rows=1
loops=7896)',)
(" Filter: ((filename)::text =
'alafasted_cor.xlsx'::text)",)
(' -> Index Scan using
"DataRepo_peakdatalabel_peak_data_id_f6151d4a" on "DataRepo_peakdatalabel"
(cost=0.25..8.27 rows=1 width=12) (actual time=0.007..0.217 rows=386
loops=8)',)
(" Filter: ((element)::text = 'C'::text)",)
(' Rows Removed by Filter: 180',)
('Planning Time: 0.963 ms',)
('Execution Time: 202327.334 ms',)
```

There is a thread on the Django forum with lots of other details, but at the
outset of that discussion, I didn't even know what was causing the speed
issue. For all I knew, it was some other change to our code, but with the
help of the folks in the Django forum, I think that this is conclusively due
to postgres 13.

https://forum.djangoproject.com/t/test-time-doubled-after-django-3-2-4-2-and-postgres-10-13-update/24843/16

If this is a known issue, is there a version of postgres (after v13) that
resolves it?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2023-11-01 23:25:48 BUG #18178: New Restriction on "ON SELECT" rules on tables
Previous Message Alexander Korotkov 2023-11-01 22:29:41 Re: BUG #18170: Unexpected error: no relation entry for relid 3