Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Leach <rleach(at)princeton(dot)edu>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10
Date: 2023-11-03 00:23:38
Message-ID: CAApHDvqdPv9RV6G+Ua5S6ADpR1U7vuEtGV_7COvLXnnrbv8OXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 3 Nov 2023 at 09:04, Robert Leach <rleach(at)princeton(dot)edu> wrote:
> > Nothing looks particularly bug like so far. It seems the
> > pg_class.reltuples estimate for this relation is way out.
>
> If this is operating as intended, there are a couple questions I have:
>
> 1. In postgres 10, the query always runs in under 1 second, regardless of the context. In postgres 13, depending on the context, why does it almost always takes over 200 seconds?

The reason the query is slow is that the planner thinks that there's
only 1 row in the DataRepo_peakdata table. In reality, when you
executed that query there were 7896 rows.

Because the planner thinks there's 1 row. It opts to use the fastest
method available to join two relations where one side has just 1 row,
i.e. Nested Loop. Unfortunately, it's a risky plan as
underestimations have quadratic repercussions. In the v13 query, this
comes out at this point in the plan:

-> Nested Loop (cost=0.62..37.47 rows=1 width=8)
(actual time=10492.455..202323.935 rows=8 loops=1)
Join Filter: ("DataRepo_peakgroup".id =
"DataRepo_peakdata".peak_group_id)
Rows Removed by Join Filter: 7888
-> Index Scan using
"DataRepo_peakdata_peak_group_id_4dd87f4a" on "DataRepo_peakdata"
(cost=0.25..8.26 rows=1 width=8) (actual time=0.017..7.149 rows=7896
loops=1)
-> Nested Loop (cost=0.38..29.20 rows=1 width=4)
(actual time=2.942..25.621 rows=1 loops=7896)

You can see the planner thought the index scan would produce 1 row.
7896 were found during execution.

Most of the time of the entire query can be accounted for by this plan
fragment. The average execution time for 1 loop of that 2nd Nested
Loop quoted above is 25.621 milliseconds. If you multiply that number
by the number of loops (7896), you get 202303.416 milliseconds.

Because the scan on DataRepo_peakdata contains no clauses, the row
estimate here is derived from pg_class.reltuples for that index.
VACUUM and ANALYZE adjust this pg_class column after they run so that
the planner has a recent estimate on the number of records in the
table. The estimated tuples for a clause-less scan of an index don't
come directly from pg_class.reltuples. What happens is it looks at
relpages and reltuples to calculate the average tuples per page then
we apply that average over the *current* number of pages in the index.

It would be interesting to see what pg_class reltuples and relpages is
set to for this index.

> 2. In postgres 13, the query runs in under 1 second if I run just the one test that (starting from an empty database) loads data and runs the query. BUT, if I run that one test after having run tests in another python/django test class (which loads data, runs various tests, then destroys all the data that that test class loaded), that query takes over 200 seconds to run. Why would that be?

I don't know what these tests do. If they update some data in the
tables then perhaps the index becomes bloated and that results in the
number of pages in the index changing which affects the estimate of
the number of tuples in the index.

It would be good to get the output of
pg_relation_size('DataRepo_peakdata_peak_group_id_4dd87f4a') before
and after the tests run.

> I understand that the above involves variables unrelated to postgres (i.e. Django test classes), but if the only thing I change is postgres (v10 versus v13), v10 is always fast and v13 has this context-dependent slowness and I have not been able to find a problem in the python code after having investigated this now for a week and a half.

Going by your EXPLAIN output, I think we can confirm the problem is
the reltuples value that's calculated in PostgreSQL's
estimate_rel_size() function. It's just a question of finding out
*why* the reltuples value is not close to reality. An incorrect value
does not mean PostgreSQL bugs. Stale stats could be an explanation.
Let's dig further to find out.

> We have been endeavoring to keep our code database-architecture-independent, and as far as I understand the suggestions made by the Django folks, it seems like running the vacuum command in key places is potentially a strategy to deal with this issue (even though I haven't been able to get it to consistently work yet), but doing that would tie the codebase to a particular database architecture. Are there alternate solutions that do not involve altering our Django codebase?

We need to find out the problem first. Perhaps more aggressive
autovacuum settings could help, but we've not gotten far enough to
tell yet.

I suspect it's a case of the fact that running tests do a whole load
of database changes in a short space of time. In this case, perhaps
that's just faster than autovacuum can react to those changes.

> > Has autovacuum gotten to this table recently?
>
>
> I only just learned the other day about the "VACUUM FULL ANALYZE" command and have been trying to manually run it at different points with mixed outcomes WRT speed. I don't know what the difference between that and "autovacuum" is.

Autovacuum will never do VACUUM FULL. It'll do various forms of
VACUUM and also ANALYZE.

> > select * from pg_stat_user_tables where relid = '"DataRepo_peakdata"'::regclass;
>
> When I try to execute this command, I get:
>
> psycopg2.errors.UndefinedTable: relation "DataRepo_peakdata" does not exist

You'd need to be connected to the same database as that table exists
in. Your EXPLAIN output had extra formatting in it. I had assumed the
quotes were there because the table was created with some upper-case
characters. PostgreSQL folds unquoted identifiers to lower-case.
Perhaps you can try without the quotes if you still get that when
connected to the correct database.

> > The plan would likely come good if you analyzed that table. You
> > should see if you can figure out why autovacuum hasn't analyzed it.
>
> How do I determine that? Also, I have tried inserting `vacuum full analyze` commands at different points in the code, and it doesn't appear to consistently speed things up. In fact, before I started attempting vacuuming, there are occasional instances where it randomly ran fast without me changing anything, i.e. the problem intermittently goes away. Though like I said, I've been working on solving this for a week and a half, so I could be mistaken. I've tried so many things and I have been switching back to v10 occasionally, so it could be that I was on v10, thinking I was on v13 when I saw it run fast.

Perhaps it goes away after autovacuum runs ANALYZE on the table again.
I suspect this problem depends on the timing of autovacuum and suspect
you might see the tests running faster more often if you changed
autovacuum_naptime to 1 second. You'd probably see them more
consistently running slower if you set autovacuum_naptime to some
value higher than it's set to right now.

David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Kapila 2023-11-03 03:39:12 Re: Logical replication is missing block of rows when sending initial sync?
Previous Message Tomas Vondra 2023-11-02 22:00:00 Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10