Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, rleach(at)princeton(dot)edu, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10
Date: 2023-11-02 22:00:00
Message-ID: e61a37f3-eeeb-21cb-647d-0fc3771f94c9@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 11/2/23 11:45, David Rowley wrote:
> On Thu, 2 Nov 2023 at 21:56, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
>> I cannot claim to understand the bug that is causing this issue, so the best
>> I can do is simply provide the explain output and try to keep from providing
>> confusing details, because this is outside the realm of my expertise:
>
>
>> (' -> Index Scan using
>> "DataRepo_peakdata_peak_group_id_4dd87f4a" on "DataRepo_peakdata"
>> (cost=0.25..8.26 rows=1 width=8) (actual time=0.017..7.149 rows=7896
>> loops=1)',)
>
> Nothing looks particularly bug like so far. It seems the
> pg_class.reltuples estimate for this relation is way out.
>
> Has autovacuum gotten to this table recently?
>
> select * from pg_stat_user_tables where relid = '"DataRepo_peakdata"'::regclass;
>
> The plan would likely come good if you analyzed that table. You
> should see if you can figure out why autovacuum hasn't analyzed it.
>
> For the future, it might be best to post EXPLAIN output as an
> attachment. The extra formatting makes it difficult to read.
>

I did look at the django forum, and the plans are there. For convenience
I'm attaching them here too. But there's nothing particularly revealing,
except that it indeed seems to be related to the index scan.

One thing that puzzles me is why it'd behave differently between PG10
and PG13. I'd have expected both releases to do the same thing if there
are no stats, but maybe PG10 does analyze it time for some reason?

Would be interesting to see the pg_class data (reltuples, relpages and
timestamp of last analyze/autoanalyze) for the slow/fast cases.

However, the forum thread also claims this:

And here is the Django 4.2 and postgres 13 output when running just
the one test (which is always fast):

Which seems weird. But maybe the table is modified/updated by the tests,
so a single test might seem much less data, or something? Not sure.

However, there's one more interesting bit - the 'weird errors' after
placing VACUUM FULL ANALYZE into test setup/teardown are this:

psycopg2.errors.DataCorrupted: missing chunk number 0 for toast
value 2884552 in pg_toast_2619

That seems unrelated to the planning issue, but it definitely seems like
a bug. Interestingly enough we got a report [1] about the same error on
pgsql-hackers a couple days ago. That error affected PG11-16, which
seems to be consistent with this (PG13).

[1]
https://www.postgresql.org/message-id/CALdSSPhmqoN02ciT4UxS6ax0N84NpRwPWm87nKJ_%2B0G-Na8qOQ%40mail.gmail.com

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
explains.txt text/plain 6.4 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2023-11-03 00:23:38 Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10
Previous Message Robert Leach 2023-11-02 20:04:19 Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10