From: | Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Recheck if ANALYZE is needed after VACUUM finishes by autovacuum |
Date: | 2025-02-05 06:31:04 |
Message-ID: | 580ab0777864e3a09951824ffed1ad21@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I would like to propose adding logic to recheck whether an ANALYZE is
necessary at the end of VACUUM. I believe this could be a viable idea
to mitigate a problem encountered by my customer.
The customer’s issue occurred as follows:
1. Every 5 minutes, a large amount of data (2 million records) was
updated via DELETE/COPY operations.
2. Autovacuum executed VACUUM/ANALYZE, and the VACUUM process
took 2 hours to complete.
3. Before the next autovacuum run (autovacuum_naptime=1min), no
further DELETE/COPY operations occurred, leaving mod_since_analyze=0.
4. Autovacuum then ran VACUUM without ANALYZE, taking another 2 hours
to complete.
5. At the end of this VACUUM, pg_class.reltuples was updated to a very
small value (150), far lower than the actual row count (2 million).
6. The execution plan changed, resulting in a significant degradation
in query performance (500ms → 1000s). The bitmap index scan node was
estimated to return 30 rows but actually returned 50 million rows.
7. Query performance remained degraded until the next autovacuum run,
which finally executed ANALYZE. However, since the next VACUUM would
also take 2 hours, performance degradation persisted during that
period.
As you may already be thinking, there are ways to address or mitigate
this
issue depending on the application requirements and hardware
configuration.
For example, using TRUNCATE or adjusting parameters like cost_delay to
speed
up VACUUM can prevent table bloat and avoid significant query
performance
degradation. However, certain system requirements may make these
solutions
impractical. Even in such cases, if there is a way to mitigate the
problem,
it would be worth adopting.
Therefore, I decided to propose $SUBJECT.
VACUUM updates pg_class.reltuples based on the state of tuples it
observes
during processing. However, it cannot observe tuples stored in pages
that
have already been vacuumed, which causes an issue. For example, in cases
like
this, there is a possibility that pg_class.reltuples may be updated to a
value
significantly smaller than the actual row count. To address such
situations,
I am considering introducing a mechanism to recheck whether an ANALYZE
is
necessary, particularly when a VACUUM process takes a long time. This
would
ensure that ANALYZE is executed promptly in such cases.
What do you think about this approach?
This approach cannot completely prevent the issue. At the very least,
between
the completion of VACUUM and the subsequent execution of ANALYZE,
pg_class.reltuples will contain highly inaccurate values. However, it
can
significantly reduce the time during which the problem occurs, thereby
mitigating the impact of incorrect statistics.
As a potential complete solution, we could consider skipping the update
of
statistics by VACUUM when an immediate ANALYZE is expected. However, I
see
this as a next step after adding the logic to recheck the necessity of
ANALYZE.
Attached are two files:
* reproduce_small_reltuples.tar.gz: Steps to reproduce the issue where
reltuples
is updated to a small value at the end of VACUUM.
* v1-0001-Add-logic-to-recheck-if-ANALYZE-is-needed-after-V.patch: A
patch
implementing the proposed logic to recheck the necessity of ANALYZE
after VACUUM.
Concerns about the v1 patch: There may be other issues or better ideas,
but here are my current concerns:
* The number of accesses to pg_class increases, which could lead to
performance issues, although I haven’t conducted performance tests.
* When checking the necessity of ANALYZE, should the latest parameter
values
be referenced? The current implementation uses the parameter values at
the
time of VACUUM execution, except for pg_class.reltuples.
I welcome your feedback. I’m open to alternative ideas and not stick to
this particular approach.
Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-logic-to-recheck-if-ANALYZE-is-needed-after-V.patch | text/x-diff | 15.9 KB |
reproduce_small_reltuples.tar.gz | application/gzip | 2.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-02-05 06:38:26 | Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary |
Previous Message | Amit Kapila | 2025-02-05 06:30:32 | Re: Conflict detection for update_deleted in logical replication |